This four-day course provides practical and theoretical knowledge necessary to operate a Hadoop cluster. We put high emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.
During the training, you will act as a Hadoop administrator who is given 7 machines in the public cloud. Your goal is to install and properly configure a multi-node Hadoop cluster with popular components from the Hadoop Ecosystem (e.g. Spark, Hive, Oozie, Sqoop). Your cluster must be fully-functional and able to survive various failures. You will change various configuration settings, deploy HA for HDFS and YARN, tweak the YARN scheduler, analyze values of various Hadoop-related metrics, define and respond to alerts and perform popular maintenance tasks (e.g. adding new nodes, balancing HDFS, troubleshooting failed applications).
You can use our website in order to register for the upcoming training, simply click here.
The next Hadoop Administration Training will take place in Warsaw from 24th of April until 27th of April 2017. The cost of the training is 5500 PLN per person+tax. The workshop will be conducted in Polish. Before you register please read carefully the Term&Conditions of our trainings.
Basic experience with any Linux system. No prior knowledge about Hadoop is required.
IT professionals who will be responsible for installing, configuring and managing Hadoop clusters.
Day 1 – Hadoop Ecosystem
- Course introduction
- Quick introduction to core Hadoop components
- Hands-on Exercises: Installing the Hadoop cluster using a cluster manager
- Connecting to machines in the public cloud
- Installing the cluster manager (Cloudera Manager or Apache Ambari)
- Installation of core components of a Hadoop cluster
- Overview of HDFS
- Basic concepts e.g. writing/reading files, replication, metadata and blocks of data
- Daemons and cluster infrastructure e.g. NameNode, DataNodes
- Key properties and use-cases
- Hands-on Exercises: Verification of HDFS installation and running HDFS commands
- Overview of YARN
- Motivation and basic concepts
- Daemons and cluster infrastructure e.g. ResourceManager, NodeManagers, containers
- Exercises: Verification of YARN installation and running YARN commands
- Overview of projects from Hadoop Ecosystem
- Processing data in Hadoop cluster with Hive
- Interactive analysis with Spark
- Transferring data to HDFS with Sqoop
- Defining and submitting workflow with Oozie
- Hands-on Exercises – Using Hive, Sqoop and Spark
Day 2 – Advanced Hadoop
- Administrative aspects of HDFS
- NameNode internals e.g. metadata management, startup procedure, checkpointing with Secondary NameNode
- Important HDFS configuration settings
- Hands-on Exercises: Changing the Java heap size, restarting NameNode, checking checkpointing status, balancing HDFS
- Administrative aspects of YARN
- Cluster resources e.g. container sizes, limits and best practices
- Important configuration settings
- Hands-on Exercises: Reviewing and tuning resource-related settings such as vcores and RAM.
- Monitoring and alerting
- Monitoring and alerting capabilities
- Hands-on Exercises: Creating custom charts, dashboards and receiving alerts
Day 3 – Hadoop Security, High Availability and Multi-tenancy
- Hadoop security
- Authentication with Kerberos
- Authorization for Hadoop (including Apache Sentry or Apache Ranger)
- Security-related features e.g. impersonation, encryption, auditing
- High availability for Hadoop components
- HA design for HDFS, YARN, Hive, Oozie, HUE
- Hands-on Exercises: Enabling NameNode HA and verifying its correctness
- Bonus Hands-on Exercises: Migrating NameNode to a different host
- Bonus Hands-on Exercises: Enabling and verifying ResourceManager HA
- YARN Schedulers
- Overview of Fair/Capacity Scheduler
- Hands-on Exercises: Configuring queues and ACLs in the Scheduler
- Hands-on Exercises: Configuring multi-tenant queues and ACLs in the Scheduler
Day 4 – Popular Maintenance Tasks
- Popular cluster maintenance tasks
- Hands-on Exercises: Expanding the cluster, balancing HDFS, decommissioning a node, troubleshooting Spark app
- Backup and Disaster Recovery
- Build-in BDR features and components in Hadoop and other Hadoop-related projects
- Hands-on Exercises: Using Trash, HDFS snapshots and DistCp
- BONUS: Advanced configuration settings for HDFS and YARN
- BONUS: Hardware and software selection for Hadoop clusters
Thanks to having practical experience with Cloudera or Hortonworks distributions, we can offer flexible training course where the agenda can be customized to fit your production cluster. Possible customization is available:
- HDP (Apache Ambari) or CDH (Cloudera Manager)
- Addition of some of components: Cloudera Impala, Apache Tez, Facebook Presto, Apache Flume, Apache Kafka, Apache Sentry, Apache Ranger, Search (Apache Solr)
- Exercises for the Capacity Scheduler or the Fair Scheduler
The training provides a carefully prepared mix of theory, exercises, demos, discussions, quizzes and … fun! We make sure that each participant is highly engaged in hands-on exercises, discussions and teamwork exercises.
A training takes 4 days, but it can be split into two separate 2-day sessions.
Please contact us
for any questions on training courses, or if you would like to discuss a custom, on-site training course.