Next term
9-12 December 2019
4-days
IT professionals, Administrators
Warsaw
e.g. HDFS, YARN, Spark, Hive, Sqoop, Hue
5500 PLN + 23% VAT
icon workshop encrypted

Hadoop Administrator Training

This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.

Training Overview

After the training, participants will be able to independently install and configure a secure and stable Hadoop cluster. They will understand the architecture, requirements and role of the individual components of core Hadoop. They will be also prepared to troubleshoot problems with Hadoop clusters and tune cluster performance.

Course agenda*

Day 1

Hadoop Ecosystem

  • Course introduction

  • A quick introduction to core Hadoop components

  • Hands-on Exercises: Installing the Hadoop cluster using a cluster manager - Connecting to machines in the public cloud

    • Installing the cluster manager (Cloudera Manager or Apache Ambari)
    • Installation of core components of a Hadoop cluster
  • Overview of HDFS

    • Basic concepts e.g. writing/reading files, replication, metadata and blocks of data
    • Daemons and cluster infrastructure e.g. NameNode, DataNodes
    • Key properties and use-cases
    • Hands-on Exercises: Verification of HDFS installation and running HDFS commands
  • Overview of YARN

    • Motivation and basic concepts
    • Daemons and cluster infrastructure e.g. ResourceManager, NodeManagers, containers
    • Exercises: Verification of YARN installation and running YARN commands
  • Overview of projects from Hadoop Ecosystem

    • Processing data in Hadoop cluster with Hive
    • Interactive analysis with Spark
    • Transferring data to HDFS with Sqoop
    • Defining and submitting workflow with Oozie
    • Hands-on Exercises: Using Hive, Sqoop, and Spark
Day 2

Advanced Hadoop

  • Administrative aspects of HDFS

    • NameNode internals e.g. metadata management, startup procedure, checkpointing with Secondary NameNode
    • Important HDFS configuration settings
    • Hands-on Exercises: Changing the Java heap size, restarting NameNode, checking checkpointing status, balancing HDFS
  • Administrative aspects of YARN

    • Cluster resources e.g. container sizes, limits and best practices
    • Important configuration settings
    • Hands-on Exercises: Reviewing and tuning resource-related settings such as vcores and RAM.
  • Monitoring and alerting

    • Monitoring and alerting capabilities
  • Hands-on Exercises: Creating custom charts, dashboards and receiving alerts

Day 3

Hadoop Security, High Availability and Multi-tenancy

  • Hadoop security

    • Authentication with Kerberos
    • Authorization for Hadoop (including Apache Sentry or Apache Ranger)
    • Security-related features e.g. impersonation, encryption, auditing
  • High availability for Hadoop components

    • HA design for HDFS, YARN, Hive, Oozie, HUE
    • Hands-on Exercises: Enabling NameNode HA and verifying its correctness
    • Bonus Hands-on Exercises: Migrating NameNode to a different host
    • Bonus Hands-on Exercises: Enabling and verifying ResourceManager HA
  • YARN Schedulers

    • Overview of Fair/Capacity Scheduler
    • Hands-on Exercises: Configuring queues and ACLs in the Scheduler
    • Hands-on Exercises: Configuring multi-tenant queues and ACLs in the Scheduler
Day 4

Popular Maintenance Tasks

  • Popular cluster maintenance tasks

    • Hands-on Exercises: Expanding the cluster, balancing HDFS, decommissioning a node, troubleshooting Spark app
  • Backup and Disaster Recovery

    • Build-in BDR features and components in Hadoop and other Hadoop-related projects
    • Hands-on Exercises: Using Trash, HDFS snapshots and DistCp
  • BONUS: Advanced configuration settings for HDFS and YARN

  • BONUS: Hardware and software selection for Hadoop clusters

* GetInData reserves the right to make any changes and adjustments to the presented agenda.

Instructors

Our workshops and training programmes are organised by experienced instructors with many years' real-life Big Data experience. Get to know our team!

More information

Training materials will be made available to all participants in PDF format.

Contact person

Klaudia Wachnio
+48 663 422 641
Piotr Krewski
+48 888 185 137
Registration Form

Hadoop Administrator Training

(Warsaw, 9-12 December 2019)

Before you register please read carefully the  Terms & Conditions  of our training.

Testimonials

Completed in half the estimated time and with a fivefold improvement on data collection goals, the robust product has exponentially increased processing capabilities. GetInData’s in-depth engagement, reliability, and broad industry knowledge enabled seamless project execution and implementation.

Wojciech Ptak
CTO

GetInData had been supporting us in building production Big Data infrastructure and implementing real-time applications that process large streams of data. In light of our successful cooperation with GetInData, their unique experience and the quality of work delivered, we recommend the company as a Big Data vendor.

Miłosz Balus
CTO

GetInData delivered a robust mechanism that met our requirements. Their involvement allowed us to add a feature to our product, despite not having the required developer capacity in-house.

Stephan Ewen
CTO

Their consistent communication and responsiveness enabled GetInData to drive the project forward. They possess comprehensive knowledge of the relevant technologies and have an intuitive understanding of business needs and requirements. Customers can expect a partner that is open to feedback.

Wilson Yu Cao
Development Team Manager

We sincerely recommend GetInData as a Big Data training provider! The trainer is a very experienced practitioner and he gave us a lot of tips regarding production deployments, possible issues as well as good practices that are invaluable for a Hadoop administrator.

Mariusz Popko
Platform Manager

The engineers and administrators at GetInData are world-class experts. They have proven experience in many open-source technologies such as Hadoop, Spark, Kafka and Flink for implementing batch and real-time pipelines.

Kostas Tzoumas
CEO

Other Big Data Training

  • Big Data Workshop

    Big Data Workshop

    A one-day workshop focused on the practical side of using open-source, Big Data technologies. Participants will learn the basics of the most popular Big Data tools and technologies like: Hadoop, Hive, Spark and Kafka.
  • Hadoop Developer Training

    Hadoop Developer Training

    This four-day course gives software engineers a practical introduction to Big Data application development using popular projects from the Hadoop ecosystem and beyond.
  • Advanced Spark Training

    Advanced Spark Training

    This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.
  • Data Analyst Training

    Data Analyst Training

    This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.
  • Real-Time Stream Processing

    Real-Time Stream Processing

    This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.

Contact us

Fill out this simple form. Our team will contact you promptly to discuss the next steps.

hello@getindata.comFist bump illustration

Any questions?

Choose one
By submitting this form, you agree to our  Terms & Conditions