BIG DATA WORKSHOP

is a one-day event dedicated to everyone who wants to get to know with Big Data and Hadoop ecosystem. Participants will discover technologies such as Hadoop, Hive, Spark, Flink and Kafka by the most practical approach.
 

NEXT TERM

NOT SCHEDULED If you interested, please contact us!
Duration
1 day training

Target audience
Beginner

Technology
e.g. Hadoop, Hive, Spark, Flink and Kafka

Event time left:

2017/02/15 10:37:48

Workshop overview

During the workshop you’ll act as a Big Data engineer and analyst working for a fictional company StreamRockTM that creates an application for music streaming (Spotify alike). The main goal of your work is to take advantage of Big Data technologies such as Hadoop, Spark or Hive to analyze various datasets about the users and the songs they played.  We will process data in a batch manner to get the data-driven answers to important business questions. Every exercise will be executed on a remote multi-node Hadoop cluster.

 

The workshop is highly focused on a practical experience. Instructors will not only teach you all the important theory but will also share their experience gained from working with Big Data technologies for several years.

This training is dedicated to everyone who is interested in Big Data technologies. It is particularly useful for analysts, engineers and managers that want to start they adventure with Hadoop.

All you need to fully participate in our training program is a laptop with the web browser, Shell terminal (e.g. Putty) and the wi-fi connection. Our workshops are mostly technical (and some business), however you do not need to have previous experience with Big Data technologies.

Course agenda*

PART 1


Introduction to the Big Data and Apache Hadoop

  • Description of the StreamRock company along with all its opportunities and challenges that come from the Big Data technologies.
  • Introduction to core Hadoop technologies such as HDFS or YARN.
  • Hands-on exercise: Accessing a remote multi-node Hadoop cluster.

PART 2


Providing data-driven answers to business questions using SQL-like solution

  • Introduction to Apache Hive.
  • Hands-on exercise: Importing structured data into the cluster using HUE.
  • Hands-on exercise: Ad-hoc analysis of the structured data with Hive.
  • Hands-on exercise: The visualisation of results using HUE.

PART 3


Implementing scalable ETL processes on the Hadoop cluster

  • Introduction to Apache Spark, Spark SQL and Spark DataFrames.
  • Hands-on exercise: Implementation of the ETL job to clean and massage input data using Spark.
  • Quick explanation of the Avro and Parquet binary data formats.
  • Practical tips for implementing ETL processes like process scheduling, schema management, integrations with existing systems.

PART 4


Advanced analysis of the diversified datasets

  • Hands-on exercise: Implementing ad-hoc the queries using Spark SQL and DataFrames.
  • Hands-on exercise: Visualisation of the results of Spark queries using the Spark Notebook.

PART 5


Advantages of real-time technologies from the Hadoop ecosystem

  • Real-time data collection with Apache Kafka (presentation and demo).
  • Processing real-time streams of data using Apache Flink (presentation and demo).
* GetInData reserves the right to make any changes and adjustments to the presented agenda.

Instructors

Our workshops and training programs are organized by experienced instructors with many years of real life Big Data experience. Get to know with our team!

More information

The workshop will last for 8 full hours, so you should reserve yourself a full 1 day. Of course there will be coffee and lunch breaks during the training.

FEEDBACK FROM ATTENDEES

  • Hadoop Administrator Training
    Hadoop Administrator Training, Allegro

    I do highly value substantive content of the course as well as great preparedness and layout. Knowlege passed in a ordered, consistent and effective way. Participants involvement during workshop sessions is the best indicator of this positive training!

  • Warsztaty Big Data
    Warsztaty Big Data, Stepstone

    Big Data workshops were led by real professionalists, tools and materials prepared in a way allowing participants to get down to the brass tacks straightaway without losing time. Attendees not disturbing each other and evryone can work comfortably and effectively. One can notice striking knowledge of the host and the fact that it comes from real professional work experience.

  • IE Business School

    This is an excellent course and excellent teacher. Adam was well prepared, new the subject material, was good at transmitting his knowledge to us and had prepared exercises that added a lot of value to the sessions. I would rank this six if I could.

  • Hadoop Developer Training
    Hadoop Developer Training, Conficential

    Professionally prepared and led courses. Coaches with vast experience in the presented realm.

  • IE Business School

    Outstanding professor, the course was very well planned, he is very knowledgeable about what he taught. He talked about real-world cases and managed to get the whole class interested for 6 hours straight. Definitely one of the best courses that we have had in the masters.

pattern
http://getindata.com/wp-content/themes/blake/
http://getindata.com//
#FFD966
style1
scrollauto
Loading posts...
/home/kawaa/domains/kawaa.linuxpl.info/public_html/gd/
#
off
none
loading
#
Sort Gallery
http://getindata.com/wp-content/themes/blake
on
off
Enter your email here
on
off