This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put a high emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.


NOT SCHEDULED If you are interested in, please contact us
4 days training
Target audience
IT professionals
e.g. Hadoop (HDFS, YARN), High-availability, YARN Scheduler (Capacity / Fair), HDP / CDH, Ambari / Cloudera Manager

Training overview

During the training, you will act as a Hadoop administrator who is given 7 machines in the public cloud. Your goal is to install and properly configure a multi-node Hadoop cluster with popular components from the Hadoop Ecosystem (e.g. Spark, Hive, Oozie, Sqoop). Your cluster must be fully-functional and able to survive various failures. You will change various configuration settings, deploy HA for HDFS and YARN, tweak the YARN scheduler, analyze values of various Hadoop-related metrics, define and respond to alerts and perform popular maintenance tasks (e.g. adding new nodes, balancing HDFS, troubleshooting failed applications).

IT professionals who will be responsible for installing, configuring and managing Hadoop clusters.

Basic experience with any Linux system. No prior knowledge about Hadoop is required.

Thanks to having practical experience with Cloudera or Hortonworks distributions, we can offer flexible training course where the agenda can be customized to fit your production cluster. Possible customization is available:

  • HDP (Apache Ambari) or CDH (Cloudera Manager)
  • Addition of some of the components: Cloudera Impala, Apache Tez, Facebook Presto, Apache Flume, Apache Kafka, Apache Sentry, Apache Ranger, Search (Apache Solr)
  • Exercises for the Capacity Scheduler or the Fair Scheduler

The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and&fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.

Course agenda*


Hadoop Ecosystem

  • Course introduction
  • A quick introduction to core Hadoop components
  • Hands-on Exercises: Installing the Hadoop cluster using a cluster manager
    • Connecting to machines in the public cloud
    • Installing the cluster manager (Cloudera Manager or Apache Ambari)
    • Installation of core components of a Hadoop cluster
  • Overview of HDFS
    • Basic concepts e.g. writing/reading files, replication, metadata and blocks of data
    • Daemons and cluster infrastructure e.g. NameNode, DataNodes
    • Key properties and use-cases
    • Hands-on Exercises: Verification of HDFS installation and running HDFS commands
  • Overview of YARN
    • Motivation and basic concepts
    • Daemons and cluster infrastructure e.g. ResourceManager, NodeManagers, containers
    • Exercises: Verification of YARN installation and running YARN commands
  • Overview of projects from Hadoop Ecosystem
    • Processing data in Hadoop cluster with Hive
    • Interactive analysis with Spark
    • Transferring data to HDFS with Sqoop
    • Defining and submitting workflow with Oozie
    • Hands-on Exercises: Using Hive, Sqoop, and Spark


Advanced Hadoop
  • Administrative aspects of HDFS
    • NameNode internals e.g. metadata management, startup procedure, checkpointing with Secondary NameNode
    • Important HDFS configuration settings
    • Hands-on Exercises: Changing the Java heap size, restarting NameNode, checking checkpointing status, balancing HDFS
  • Administrative aspects of YARN
    • Cluster resources e.g. container sizes, limits and best practices
    • Important configuration settings
    • Hands-on Exercises: Reviewing and tuning resource-related settings such as vcores and RAM.
  • Monitoring and alerting
      • Monitoring and alerting capabilities

    Hands-on Exercises: Creating custom charts, dashboards and receiving alerts


Hadoop Security, High Availability and Multi-tenancy

  • Hadoop security
    • Authentication with Kerberos
    • Authorization for Hadoop (including Apache Sentry or Apache Ranger)
    • Security-related features e.g. impersonation, encryption, auditing
  • High availability for Hadoop components
    • HA design for HDFS, YARN, Hive, Oozie, HUE
    • Hands-on Exercises: Enabling NameNode HA and verifying its correctness
    • Bonus Hands-on Exercises: Migrating NameNode to a different host
    • Bonus Hands-on Exercises: Enabling and verifying ResourceManager HA
  • YARN Schedulers
    • Overview of Fair/Capacity Scheduler
    • Hands-on Exercises: Configuring queues and ACLs in the Scheduler
    • Hands-on Exercises: Configuring multi-tenant queues and ACLs in the Scheduler


Popular Maintenance Tasks

  • Popular cluster maintenance tasks
    • Hands-on Exercises: Expanding the cluster, balancing HDFS, decommissioning a node, troubleshooting Spark app
  • Backup and Disaster Recovery
    • Build-in BDR features and components in Hadoop and other Hadoop-related projects
    • Hands-on Exercises: Using Trash, HDFS snapshots and DistCp
  • BONUS: Advanced configuration settings for HDFS and YARN
  • BONUS: Hardware and software selection for Hadoop clusters

* GetInData reserves the right to make any changes and adjustments to the presented agenda.


Our workshops and training programs are organized by experienced instructors with many years of real-life Big Data experience. Get to know with our team!

More information

The training will last 4 days between 9 am and 5 pm daily. There will be one lunch break and a few coffee breaks during the course.

Contact Us!

Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.

Piotr Krewski                                            Klaudia Zdunczyk
piotr@getindata.com                                 klaudia@getindata.com
+48 888 185 137                                           +48 663 422 641


  • Hadoop Administrator Training, Allegro

    I do highly value substantive content of the course as well as great preparedness and layout. Knowledge passed in a ordered, consistent and effective way. Participants involvement during workshop sessions is the best indicator of this positive training!

  • Big Data Workshop
    Big Data Workshop, Stepstone

    Big Data workshops were led by real professionalists, tools and materials prepared in a way allowing participants to get down to the brass tacks straightaway without losing time. Attendees not disturbing each other and everyone can work comfortably and effectively. One can notice striking knowledge of the host and the fact that it comes from real professional work experience.

  • IE Business School

    This is an excellent course and excellent teacher. Adam was well prepared, new the subject material, was good at transmitting his knowledge to us and had prepared exercises that added a lot of value to the sessions. I would rank this six if I could.

  • Hadoop Developer Training
    Hadoop Developer Training, Conficential

    Professionally prepared and led courses. Coaches with vast experience in the presented realm.

  • IE Business School

    Outstanding professor, the course was very well planned, he is very knowledgeable about what he taught. He talked about real-world cases and managed to get the whole class interested for 6 hours straight. Definitely one of the best courses that we have had in the masters.


Loading posts...
Sort Gallery
Enter your email here