Camus, a MapReduce job that loads data from Kafka into HDFS, has a number of time-related configuration settings and assumptions. They control how many messages are consumed from Kafka in each Camus run and where the data is stored in HDFS. I summarize them in this blog post.
In this blog post, I describe a few surprising gotchas related to the import of a MySQL table into Hive using Sqoop 1.4.5 (the most recent version supported by vendors like Hortonworks or Cloudera at the time of writing this post). Real-world scenario In my simple (yet real-world) use-case, I have a MySQL table and […]