Posts Tagged
‘spark’

We share our knowledge happily

Homespark

Geospatial analytics on Hadoop A few months ago I was working on a project with a lot of geospatial data. Data was stored in HDFS, easily accessible through Hive. One of the tasks was to analyze this data and the first step was to join two datasets on columns which were geographical coordinates. I wanted […]

Avoiding the mess in the Hadoop cluster In the first part of this blog series, I described a few challenges that I had to face to quickly implement a simple Hive query and schedule it periodically on the Hadoop cluster. These challenges include data cataloguing, data discovery, data lineage and process scheduling. I also explained […]

Zero Data Loss Guarantee in Spark Streaming When properly deployed, Spark Streaming 1.2 provides zero data loss guarantee. To enjoy this mission-critical feature, you need to fulfil following prerequisites: The input data comes from a reliable source and reliable receivers Application metadata is checkpointed by the application driver Write ahead log is enabled Let’s briefly […]

0
1
pattern
http://getindata.com/wp-content/themes/blake/
http://getindata.com//
#FFD966
style1
scrollauto
Loading posts...
/home/kawaa/domains/kawaa.linuxpl.info/public_html/gd2/
#
off
none
loading
#
Sort Gallery
http://getindata.com/wp-content/themes/blake
on
off
Enter your email here
on
off