This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.


NOT SCHEDULED If youa are interested in, please contact us
2 days training

Target audience
Data engineers

Kafka, Flink, HDFS, YARN, Spark,  Elasticsearch

Workshop overview

This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks. We focus mostly on Apache Flink ? the most promising open-source stream processing framework that is more and more frequently used in production. Additionally, we provide short introductions to Spark Streaming, Apache Storm and Apache Samza to let students know about existing alternatives to widen their perspective and help to find the best tool for their use-cases.

During the course we simulate real-world end-to-end scenario ? processing logs generated by users interacting with a mobile application in real-time. The technologies that we use include Kafka, Flink, HDFS, YARN and Elasticsearch. All exercises are done on Hadoop clusters running on a remote multi-node cluster.

Data engineers who are interested in leveraging large-scale and distributed tools to process streams of data in real-time. Some experience coding in Python, Java, or Scala, plus basic familiarity with Big Data tools (e.g. Hadoop, Spark) is assumed.

All you need to fully participate in our training program is a laptop with the web browser, Shell terminal (e.g. Putty) and the wi-fi connection. Our workshops are mostly technical (and some business), however you do not need to have previous experience with Big Data technologies.

The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and ? fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.

Course agenda*


  • Real-time data collection with Apache Kafka
    • Key concepts of log-based approach
    • Daemons and cluster infrastructure
    • Hands-on exercise: Interacting with a Kafka Cluster to produce and consume messages with CLI scripts
  • Interactive reporting and data exploration with Elasticsearch
    • Search engine as core of data-driven decisions
    • Live demo: visualizing continuously arriving data with Kibana
  • Introduction to Apache Flink
    • Constructing DataStreams with Flink APIs
    • Hands-on excercises: Applying simple filters to stream of events and running jobs in YARN cluster
    • Grouping data into windows based of different notions of time
    • Hands-on excercises: Calculating user session statistics
    • Connecting to the external world
    • Hands-on excercises: Reading events from Kafka and writing statistics to Elasticsearch for real-time dashboards in Kibana


  • Dive deep into Apache Flink
    • Advanced time handling, when out-of-the box solutions are not enough
    • Daemons and cluster infrastructure, overview of deployment modes e.g. YARN, Mesos, Docker, Standalone
    • Accessing fault-tolerant state and how it is checkpointed
    • Hands-on exercises: Using low-level functions and state for constructing complex time-based scenarios
    • Advantages of relational approach with StreamSQL
    • Hands-on excercises: Querying streams with SQL language
    • Early alerting based on sequence of events with Flink CEP library
    • Hands-on excercises: Writing pattern sequences and converting matches to alerts
  • Comparison of other streaming frameworks like Spark Streaming, Kafka Streams, Storm
    • Daemons and cluster infrastructure
    • How do they implement fault-tolerance
    • Feature sets

* GetInData reserves the right to make any changes and adjustments to the presented agenda.


Our workshops and training programs are organized by experienced instructors with many years of real life Big Data experience. Get to know with our team!

More information

Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.


  • Hadoop Administrator Training
    Hadoop Administrator Training, Allegro

    I do highly value substantive content of the course as well as great preparedness and layout. Knowlege passed in a ordered, consistent and effective way. Participants involvement during workshop sessions is the best indicator of this positive training!

  • Big Data Workshop
    Big Data Workshop, Stepstone

    Big Data workshops were led by real professionalists, tools and materials prepared in a way allowing participants to get down to the brass tacks straightaway without losing time. Attendees not disturbing each other and evryone can work comfortably and effectively. One can notice striking knowledge of the host and the fact that it comes from real professional work experience.

  • IE Business School

    This is an excellent course and excellent teacher. Adam was well prepared, new the subject material, was good at transmitting his knowledge to us and had prepared exercises that added a lot of value to the sessions. I would rank this six if I could.

  • Hadoop Developer Training
    Hadoop Developer Training, Conficential

    Professionally prepared and led courses. Coaches with vast experience in the presented realm.

  • IE Business School

    Outstanding professor, the course was very well planned, he is very knowledgeable about what he taught. He talked about real-world cases and managed to get the whole class interested for 6 hours straight. Definitely one of the best courses that we have had in the masters.


Loading posts...
Sort Gallery
Enter your email here