Next term
Not scheduled
If you are interested, please contact us
4-days
Data analysts, BI analysts
e.g. Hadoop, Hive, Spark, Kibana, ElasticSearch
icon mission analytical

Data Analyst Training

This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.

Training outcome

Participants will gain the ability to effectively work with the huge datasets stored in Hadoop clusters as well an understanding of which processing and analytics needs are best addressed by the individual Big Data frameworks. After the training, participants will be able to independently import data to a Hadoop cluster, store it in Hive tables, use Hive and Spark to transform and analyse the data and Kibana to visualise it.

Course agenda*

Day 1

Introduction to Big Data and Apache Hadoop

  • Description of the StreamRock along with all its opportunities and challenges that come from the Big Data technologies.

    • Hands-on exercise: Accessing a remote multi-node Hadoop cluster.
  • Introduction to HDFS

    • Hands-on exercise: Importing structured data into the cluster using HUE
  • Introduction to YARN

    • Hands-on exercise: Familiarisation with YARN Web UI
  • A short overview of MapReduce

    • Hands-on exercise: Submitting an example ETL map-reduce job to YARN cluster
Day 2

Providing data-driven answers to business questions using SQL-like solution

  • Introduction to Apache Hive

    • Hands-on exercise: Creating Hive databases and tables using HUE
    • Hands-on exercise: Ad-hoc analysis of structured data with HiveQL
  • Advanced aspects of Hive e.g. partitioning, bucketing, strict-mode, execution plan

    • Hands-on exercise: Hive partitioning
  • Extending Hive with custom UDFs and SerDes

    • Hands-on exercise: Using custom Java UDF and SerDe for JSON
  • Hadoop File Formats (Avro, Parquet, ORC)

    • Hands-on exercise: Interacting with Parquet and Avro in Hive
Day 3

Interactive analysis with Apache Spark

  • Introduction to Apache Spark

    • Spark Core and its advantages over map-reduce
    • Basics of working with Spark API
    • Spark architecture and integration with YARN
    • Hands-on exercise: Interacting with Spark Core API
  • Doing data analysis with SparkSQL

    • Introduction to SparkSQL and DataFrames
    • Integration with Hive and other tools
    • Introduction to Spark notebooks
    • Hands-on exercise: Implementing SparkSQL application to clean the dataset with song records
    • Hands-on exercise: Data analysis with SparkSQL and Jupyter
Day 4

Visualisation and Search

  • Introduction to ElasticSearch and Kibana

    • Most important features of ElasticSearch
    • Hands-on exercise: Indexing data with ElasticSearch and visualisations in Kibana
  • Advanced aspects of working with Spark notebooks

    • Hands-on exercise: Visualisation and publishing data with Zeppelin or Jupyter
* GetInData reserves the right to make any changes and adjustments to the presented agenda.

Instructors

Our workshops and training programmes are organised by experienced instructors with many years of real-life Big Data experience. Get to know our team!

More information

Training material will be made available to all participants in PDF format.

Contact person

Klaudia Wachnio
off
Piotr Krewski
+48 888 185 137

Testimonials

Completed in half the estimated time and with a fivefold improvement on data collection goals, the robust product has exponentially increased processing capabilities. GetInData’s in-depth engagement, reliability, and broad industry knowledge enabled seamless project execution and implementation.

Wojciech Ptak
CTO

GetInData had been supporting us in building production Big Data infrastructure and implementing real-time applications that process large streams of data. In light of our successful cooperation with GetInData, their unique experience and the quality of work delivered, we recommend the company as a Big Data vendor.

Miłosz Balus
CTO

GetInData delivered a robust mechanism that met our requirements. Their involvement allowed us to add a feature to our product, despite not having the required developer capacity in-house.

Stephan Ewen
CTO

Their consistent communication and responsiveness enabled GetInData to drive the project forward. They possess comprehensive knowledge of the relevant technologies and have an intuitive understanding of business needs and requirements. Customers can expect a partner that is open to feedback.

Wilson Yu Cao
Development Team Manager

We sincerely recommend GetInData as a Big Data training provider! The trainer is a very experienced practitioner and he gave us a lot of tips regarding production deployments, possible issues as well as good practices that are invaluable for a Hadoop administrator.

Mariusz Popko
Platform Manager

The engineers and administrators at GetInData are world-class experts. They have proven experience in many open-source technologies such as Hadoop, Spark, Kafka and Flink for implementing batch and real-time pipelines.

Kostas Tzoumas
CEO

Other Big Data Training

  • Machine Learning Operations Training (MLOps)

    Machine Learning Operations Training (MLOps)

    This four-day course will teach you how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.
  • Hadoop Administrator Training

    Hadoop Administrator Training

    This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.
  • Advanced Spark Training

    Advanced Spark Training

    This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.
  • Real-Time Stream Processing

    Real-Time Stream Processing

    This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.
  • Analytics engineering with Snowflake and dbt

    Analytics engineering with Snowflake and dbt

    This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy Snowflake data transformation workflows faster than ever before.
  • Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud

    Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud

    This 2-day training is dedicated to data engineers, data scientists, or a tech enthusiasts. This workshop will provide hands-on experience and real-world insights into architecting data applications on the Snowflake Data Cloud.
  • Modern Data Pipelines with DBT

    Modern Data Pipelines with DBT

    In this one day workshop, you will learn how to create modern data transformation pipelines managed by DBT. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team.
  • Real-time analytics with Snowflake and dbt

    Real-time analytics with Snowflake and dbt

    This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy real-time Snowlake data pipelines.

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy