Fast SQL on Hadoop

This two-day course teaches students how to efficiently analyze massive amounts of data available in Hadoop cluster.

During the course we simulate real-world scenarios. Every participant plays a role of data analyst who works for an imaginary company called StreamRock (inspired by Spotify – our favourite music streaming app). Students use popular open-source tools with SQL-like interface to quickly extract the knowledge hidden in the large data sets. The workshop consists of practical exercises that are executed on the Hadoop cluster running in the public cloud.

Target Audience

Data Analysts, BI Specialists and all people who are interested in iterating fast by using efficient SQL tools to extract knowledge from large datasets stored in a Hadoop cluster. Basic knowledge of SQL is assumed.

Course Agenda

Day 1

  • Introduction to use-case: StreamRock
  • Introduction to Hadoop
    • HDFS
    • YARN
  • File Formats
    • Text formats
    • Row-oriented format – Apache Avro
    • Column-oriented formats – Parquet and ORC
  • Apache Hive
    • Key concepts
    • Comparison with RDBMS
    • Hive Query Language
    • Hands-on exercises
    • Hive architecture
    • Execution engines: MapReduce, Tez, Spark
    • Useful features
    • Query optimisations techniques

    Day 2

  • Cloudera Impala
    • Typical use-cases
    • Comparison with Hive
    • Impala architecture
    • Hands-on exercises
  • Bonus – Facebook Presto
    • Comparison with Hive and Impala
    • Presto architecture
    • Demo
  • Spark SQL
    • Introduction to Spark
    • Key features
    • Integration with Hive
    • DataFrames
    • Hands-on exercises
  • Comparing Hive, Impala, Spark SQL and Presto
    • Benchmarks
    • When to use which

    Our Approach

    The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and … fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.

    More Information

    Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.