Posts Tagged
‘hdfs’

We share our knowledge happily

Homehdfs

  Data Pipeline Evolution The LinkedIn Engineering blog is a great resource of technical blog posts related to building and using large-scale data pipelines with Kafka and its “ecosystem” of tools. In this post, I provide several pictures and diagrams (including quotes) that summarise how data pipeline has evolved at LinkedIn over the years. The […]

Avoiding the mess in the Hadoop cluster In the first part of this blog series, I described a few challenges that I had to face to quickly implement a simple Hive query and schedule it periodically on the Hadoop cluster. These challenges include data cataloguing, data discovery, data lineage and process scheduling. I also explained […]

Creating HDFS Snapshots and recovering a Deleted File In this tutorial, we focus on HDFS snapshots. Common use cases of HDFS snapshots include backups and protection against user errors. To demonstrate the functionality of HDFS snapshots, we create an “important” directory in HDFS, create its snapshot and “accidentally” remove a file from the directory. Finally, […]

We are happy to say that our Refcardz, titled Getting Started Apache Hadoop, has been already published by DZone. This Refcard presents Apache Hadoop, a software framework that enables distributed storage and processing of large datasets using simple high-level programming models. The card covers the most important concepts of Hadoop, describes its architecture, and explains […]

0
1
pattern
http://getindata.com/wp-content/themes/blake/
http://getindata.com//
#FFD966
style1
scrollauto
Loading posts...
/home/kawaa/domains/kawaa.linuxpl.info/public_html/gd2/
#
off
none
loading
#
Sort Gallery
http://getindata.com/wp-content/themes/blake
on
off
Enter your email here
on
off