In the first part of this blog series I described a few challenges that I had to face to quickly implement a simple Hive query and schedule it periodically on the Hadoop cluster. These challenges include data cataloguing, data discovery, data lineage and process scheduling. I also explained how they can be addressed using existing […]
In this tutorial, we focus on HDFS snapshots. Common use cases of HDFS snapshots include backups and protection against user errors. To demonstrate functionality of HDFS snapshots, we create an “important” directory in HDFS, create its snapshot and “accidentally” remove a file from the directory. Finally, we recover the file from the snapshot.