Big Data Workshop

Big Data Workshop is a one-day event dedicated to everyone who wants to get to know with Big Data and Hadoop ecosystem. Participants will discover technologies such as Hadoop, Hive, Spark, Flink and Kafka by the most practical approach.

Workshops overview

During the workshop you’ll act as a Big Data engineer and analyst working for a fictional company StreamRockTM that creates an application for music streaming (Spotify alike). The main goal of your work is to take advantage of Big Data technologies such as Hadoop, Spark or Hive to analyze various datasets about the users and the song they played. We will be processing our data in batch and streaming manners to get data-driven answers to many business questions and power product features that StreamRockTM builds. Every exercise will be executed on a remote multi-node Hadoop cluster.

The workshop is highly focused on a practical experience. The instructor will also introduce you to his own practical experience gained while working with Big Data technologies for several years.



Workshop Agenda*

Part 1 – Introduction to the Big Data and Apache Hadoop

  • Description of the StreamRock company along with all its opportunities and challenges that come from the Big Data technologies.
  • Introduction to core Hadoop technologies such as HDFS or YARN.
  • Hands-on exercise: Accessing a remote multi-node Hadoop cluster.

Part 2 – Providing data-driven answers to business questions using SQL-like solution

  • Introduction to Apache Hive.
  • Hands-on exercise: Importing structured data into the cluster using HUE.
  • Hands-on exercise: Ad-hoc analysis of the structured data with Hive.
  • Hands-on exercise: The visualisation of results using HUE.

Part 3 – Implementing scalable ETL processes on the Hadoop cluster

  • Introduction to Apache Spark, Spark SQL and Spark DataFrames.
  • Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
  • Quick explanation of the Avro and Parquet binary data formats.
  • Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.

Part 4 – Advanced analysis of the diversified datasets

  • Hands-on exercise: Implementing ad-hoc the queries using Spark SQL and DataFrames.
  • Hands-on exercise: Visualisation of the results of Spark queries using the Spark Notebook.

Part 5 – Advantages of real-time technologies from the Hadoop ecosystem

  • Real-time data collection with Apache Kafka (presentation and demo).
  • Processing real-time streams of data using Apache Flink (presentation and demo).
* GetInData reserves the right to make any changes and adjustments to the presented agenda.

1400455990_f017 Time box

The workshop will last for 8 full hours, so you should reserve yourself a full 1 day. Of course there will be coffee and lunch breaks during the training.

1400458587_User_Group Target

Our training is dedicated to everyone who is interested in Big Data, analytics, engineers, managers and others.

1400458590_Task Requirements

All you need to fully participate in our training program is a laptop with the web browser, Shell terminal (e.g. Putty) and the wi-fi connection. Our workshops are mostly technical (and some business), however you do not need to have previous experience with Big Data technologies.


Our workshops and training programs are organized by experienced instructors with many years of real life Big Data experience. Get to know with our team!


1400456712_conversation Participants reviews

The reviews come from the questionnaire conducted by the Evention company during the workshop at the Big Data Tech 2015, as well as the workshop conducted on 8th October 2015 by the GetInData company.

“Very good preparation. Big plus for a friendly interface for such technical field. Instructor’s experience with various clients was an additional advantage. The broad field of Big Data was thoroughly discussed adn practical exercises were great!” 
“Interesting workshop, perfect for someone who is new to the Big Data. The issues were discussed clearly and understandable. Real-life examples were a big plus. Great 5/5!”
“The workshop was dynamic and provided usefull exapmles of Big Data technologies.”
“The overview of the available solutions for the Hadoop ecosystem were very well presented” 
“The workshops were prepared thoroughly and very well thought through. The instrutor showed big experience, was very helpfull and fully supported the theme.”
“Professional approach, materials and tools enabled to focus on the practical exercises without disturbing one another. The knowledge and professional experience of the instructor was shown and really added up to the workshop program.”