Big Data Workshop
A one-day workshop focused on the practical side of using open-source, Big Data technologies. Participants will learn the basics of the most popular Big Data tools and technologies like: Hadoop, Hive, Spark and Kafka.
Training outcome
During the workshop you will act as a Big Data engineer and analyst working for a fictional company StreamRockTM that creates an application for music streaming (like Spotify). The main goal of your work is to take advantage of Big Data technologies such as Hadoop, Spark or Hive to analyse various datasets about the users and the songs they played. We will process data in a batch manner to get the data-driven answers to important business questions. Each exercise will be executed on a remote multi-node Hadoop cluster.
Course agenda*
Part 1
Introduction to Big Data and Apache Hadoop
Description of StreamRock along with all its opportunities and challenges that come from Big Data technologies.
Introduction to core Hadoop technologies such as HDFS or YARN.
Hands-on exercise: Accessing a remote multi-node Hadoop cluster.
Part 2
Providing data-driven answers to business questions using SQL-like solution
Introduction to Apache Hive.
Hands-on exercise: Importing structured data into the cluster using HUE.
Hands-on exercise: Ad-hoc analysis of the structured data with Hive.
Hands-on exercise: The visualisation of results using HUE.
Part 3
Implementing scalable ETL processes on the Hadoop cluster
Introduction to Apache Spark, Spark SQL, and Spark DataFrames.
Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
Quick explanation of the Avro and Parquet binary data formats.
Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.
Part 4
Advanced analysis of the diversified datasets
Hands-on exercise: Implementing ad-hoc queries using Spark SQL and DataFrames.
Hands-on exercise: Visualisation of the results of Spark queries using the Spark Notebook.
Contact person
Testimonials
Other Big Data Training
Hadoop Developer Training
This four-day course gives software engineers a practical introduction to Big Data application development using popular projects from the Hadoop ecosystem and beyond.Hadoop Administrator Training
This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.Advanced Spark Training
This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.Data Analyst Training
This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.Real-Time Stream Processing
This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.
Contact us
Fill out this simple form. Our team will contact you promptly to discuss the next steps.
hello@getindata.com