Data Analyst Training

This four-day course teaches Data Analysts how to analyze massive amounts of data available in a Hadoop YARN cluster.

Target Audience

Data Analysts, BI Analysts and all people who are interested in using large-scale computation tools to extract knowledge from large datasets stored in a Hadoop cluster.

Course Agenda

Day 1 – Core Hadoop

  • HDFS
  • YARN
  • MapReduce
  • HUE

Day 2 – SQL with Apache Hive

  • File formats: Text, Avro, Parquet
  • Apache Hive
  • Motivation for Hive
  • Key concepts
  • Comparison with RDBMS
  • Hive query language
  • Hive architecture
  • Execution engines: MapReduce, Tez, Spark
  • Useful features
  • Query optimisations techniques

Day 3 – Interactive analysis with Apache Spark

  • Motivation for Spark
  • Spark Core
  • Overview
  • Scala API
  • Architecture
  • Integration with YARN
  • Spark SQL
  • Key features
  • Integration with Hive
  • DataFrames

Day 4 – Visualization and Search

  • Dashboarding tools
  • Spark Notebooks
  • Kibana
  • Large-scale search
  • Apache Solr
  • Search on Hadoop
  • HUE Search app








Exemplary Section

See our slides about HCatalog that came out of this training!

Our Approach

The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and … fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.

More Information

Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.