Posts Tagged

We share our knowledge happily


Geospatial analytics on Hadoop A few months ago I was working on a project with a lot of geospatial data. Data was stored in HDFS, easily accessible through Hive. One of the tasks was to analyze this data and the first step was to join two datasets on columns which were geographical coordinates. I wanted […]

Avoiding the mess in the Hadoop cluster In the first part of this blog series, I described a few challenges that I had to face to quickly implement a simple Hive query and schedule it periodically on the Hadoop cluster. These challenges include data cataloguing, data discovery, data lineage and process scheduling. I also explained […]

Avoiding the mess in the Hadoop Cluster This blog series is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in Poland in February 2015. Because the talk was very well received by the audience, we decided to convert it into a blog […]

In this blog post, I describe a few surprising gotchas related to the import of a MySQL table into Apache Hive using Apache Sqoop 1.4.5 (the most recent version supported by vendors like Hortonworks or Cloudera at the time of writing this post). Real-world scenario In my simple (yet real-world) use-case, I have a MySQL […]

We are happy to share slides about HCatalog that come from Data Analyst Training delivered by GetInData. HCatalog allows users with different data processing tools (such as Apache Hive, Apache Pig, MapReduce) to share data on the Hadoop cluster in an easier way. The slides cover HCatalog’s primary motivation, goals, the most important features, currently […]

We are happy to say that our Refcardz, titled Getting Started Apache Hadoop, has been already published by DZone. This Refcard presents Apache Hadoop, a software framework that enables distributed storage and processing of large datasets using simple high-level programming models. The card covers the most important concepts of Hadoop, describes its architecture, and explains […]

Loading posts...
Sort Gallery
Enter your email here