Hadoop Administrator Training

This four-day course provides practical and theoretical knowledge necessary to operate a Hadoop cluster. We put high emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.

During the training, you will act as a Hadoop administrator who is given 7 machines in the public cloud. Your goal is to install and properly configure a multi-node Hadoop cluster with popular components from the Hadoop Ecosystem (e.g. Spark, Hive, Oozie, Sqoop). Your cluster must be fully-functional and able to survive various failures. You will change various configuration settings, deploy HA for HDFS and YARN, tweak the YARN scheduler, analyze values of various Hadoop-related metrics, define and respond to alerts and perform popular maintenance tasks (e.g. adding new nodes, balancing HDFS, troubleshooting failed applications).

1400466632_grocery-store

Registration

You can use our website in order to register for the upcoming training, simply click here.

The next Hadoop Administration Training  will take place in Warsaw from 24th of April until 27th of April 2017. The cost of the training is 5500 PLN per person+tax. The workshop will be conducted in Polish. Before you register please read carefully the Term&Conditions of our trainings.

Prerequisites

Basic experience with any Linux system. No prior knowledge about Hadoop is required.

Target Audience

IT professionals who will be responsible for installing, configuring and managing Hadoop clusters.

Course Topics

Day 1 – Hadoop Ecosystem

  • Course introduction
  • Quick introduction to core Hadoop components
  • Hands-on Exercises: Installing the Hadoop cluster using a cluster manager
  • Connecting to machines in the public cloud
  • Installing the cluster manager (Cloudera Manager or Apache Ambari)
  • Installation of core components of a Hadoop cluster
  • Overview of HDFS
  • Basic concepts e.g. writing/reading files, replication, metadata and blocks of data
  • Daemons and cluster infrastructure e.g. NameNode, DataNodes
  • Key properties and use-cases
  • Hands-on Exercises: Verification of HDFS installation and running HDFS commands
  • Overview of YARN
  • Motivation and basic concepts
  • Daemons and cluster infrastructure e.g. ResourceManager, NodeManagers, containers
  • Exercises: Verification of YARN installation and running YARN commands
  • Overview of projects from Hadoop Ecosystem
  • Processing data in Hadoop cluster with Hive
  • Interactive analysis with Spark
  • Transferring data to HDFS with Sqoop
  • Defining and submitting workflow with Oozie
  • Hands-on Exercises – Using Hive, Sqoop and Spark

Day 2 – Advanced Hadoop

  • Administrative aspects of HDFS
  • NameNode internals e.g. metadata management, startup procedure, checkpointing with Secondary NameNode
  • Important HDFS configuration settings
  • Hands-on Exercises: Changing the Java heap size, restarting NameNode, checking checkpointing status, balancing HDFS
  • Administrative aspects of YARN
  • Cluster resources e.g. container sizes, limits and best practices
  • Important configuration settings
  • Hands-on Exercises: Reviewing and tuning resource-related settings such as vcores and RAM.
  • Monitoring and alerting
  • Monitoring and alerting capabilities
  • Hands-on Exercises: Creating custom charts, dashboards and receiving alerts

Day 3 – Hadoop Security, High Availability and Multi-tenancy

  • Hadoop security
  • Authentication with Kerberos
  • Authorization for Hadoop (including Apache Sentry or Apache Ranger)
  • Security-related features e.g. impersonation, encryption, auditing
  • High availability for Hadoop components
  • HA design for HDFS, YARN, Hive, Oozie, HUE
  • Hands-on Exercises: Enabling NameNode HA and verifying its correctness
  • Bonus Hands-on Exercises: Migrating NameNode to a different host
  • Bonus Hands-on Exercises: Enabling and verifying ResourceManager HA
  • YARN Schedulers
  • Overview of Fair/Capacity Scheduler
  • Hands-on Exercises: Configuring queues and ACLs in the Scheduler
  • Hands-on Exercises: Configuring multi-tenant queues and ACLs in the Scheduler

Day 4 – Popular Maintenance Tasks

  • Popular cluster maintenance tasks
  • Hands-on Exercises: Expanding the cluster, balancing HDFS, decommissioning a node, troubleshooting Spark app
  • Backup and Disaster Recovery
  • Build-in BDR features and components in Hadoop and other Hadoop-related projects
  • Hands-on Exercises: Using Trash, HDFS snapshots and DistCp
  • BONUS: Advanced configuration settings for HDFS and YARN
  • BONUS: Hardware and software selection for Hadoop clusters

Possible Customization

Thanks to having practical experience with Cloudera or Hortonworks distributions, we can offer flexible training course where the agenda can be customized to fit your production cluster. Possible customization is available:

  • HDP (Apache Ambari) or CDH (Cloudera Manager)
  • Addition of some of components: Cloudera Impala, Apache Tez, Facebook Presto, Apache Flume, Apache Kafka, Apache Sentry, Apache Ranger, Search (Apache Solr)
  • Exercises for the Capacity Scheduler or the Fair Scheduler

Related Materials

  • Conference slides: Is Hadoop Enterprise ready? given at Big Data Tech Conference 2015 by Krzysztof Adamski – one of our instructors

Our Approach

The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and … fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.

Time-Frame

A training takes 4 days, but it can be split into two separate 2-day sessions.

More Information

Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.