DATA ANALYST TRAINING

This four-day course teaches Data Analysts how to analyze massive amounts of data available in a Hadoop YARN cluster.

NEXT TERM

NOT SCHEDULED If youa are interested in, please contact us
 
Duration
4 days training

Target audience
Data Analysts, BI Analysts

Technologies
e.g. Hadoop, Hive, Spark, Spark SQL, Kibana, HUE, Jupyter, Parquet, Avro

Training Overview

During the workshop you’ll act as a Big Data analyst working for a fictional company called StreamRockTM that creates a music streaming application (Spotify alike). The main goal of your work is to take advantage of Big Data technologies such as Hadoop, Hive, Spark and Jupyter to clean and analyze datasets about the users and the song they listened to. You’ll process the data to get data-driven answers to many business questions and power product features of StreamRock application. Every exercise will be executed on a remote multi-node Hadoop cluster.

Training is focused on a practical experience. Our instructor, apart from delivering all the necessary theory, will also introduce you to his own practical experience gained while working with Big Data technologies for several years.

Data Analysts, BI Analysts and all people who are interested in using large-scale computation tools to extract knowledge from large datasets stored in a Hadoop cluster.

All you need to fully participate in our training program is a laptop with the web browser, Shell terminal (e.g. Putty) and the wi-fi connection. Our workshops are mostly technical (and some business), however you do not need to have previous experience with Big Data technologies.

Course agenda*

DAY 1


Introduction to the Big Data and Apache Hadoop

Description of StreamRock company along with all its opportunities and challenges that come from the Big Data technologies.

  • Hands-on exercise: Accessing a remote multi-node Hadoop cluster.

Introduction to HDFS

  • Hands-on exercise: Importing structured data into the cluster using HUE
  • Hands-on exercise: Interacting with HDFS using HDFS CLI, Snakebite and WebHDFS

Introduction to YARN

  • Hands-on exercise: Familiarising with YARN Web UI

Short overview of MapReduce

  • Hands-on exercise: Submitting an example ETL map-reduce job to YARN cluster

DAY 2


Providing data-driven answers to business questions using SQL-like solution

Introduction to Apache Hive

  • Hands-on exercise: Creating Hive databases and tables using HUE
  • Hands-on exercise: Ad-hoc analysis of structured data with HiveQL

Advanced aspects of Hive e.g. partitioning, bucketing, strict-mode, execution plan

  • Hands-on exercise: Hive partitioning

Extending Hive with custom UDFs

  • Hands-on exercise: Using custom Java UDF and SerDe for JSON

Hadoop File Formats (Avro, Parquet, ORC)

  • Hands-on exercise: Interacting With Parquet And Avro in Hive

DAY 3


Interactive analysis with Apache Spark

Introduction to Apache Spark

  • Spark Core and its advantages over map-reduce 
  • Basics of working with Spark API
  • Spark architecture and integration with YARN
  • Hands-on exercise: Interacting with Spark Core API

Doing data analysis with SparkSQL

  • Introduction to SparkSQL and DataFrames
  • Integration with Hive and other tools
  • Introduction to Spark notebooks
  • Hands-on exercise: Implementing SparkSQL application to clean dataset with song records
  • Hands-on exercise: Data analysis with SparkSQL and Jupyter

DAY 4


Visualization and Search

Introduction to ElasticSearch and Kibana

  • Most important features of ElasticSearch
  • Hands-on exercise: Indexing data with ElasticSearch and visualisations in Kibana

Advanced aspects of working with Spark notebooks

  • Hands-on exercise: Visualisation and publishing data with Zeppelin or Jupyter

* GetInData reserves the right to make any changes and adjustments to the presented agenda.

Instructors

Our workshops and training programs are organized by experienced instructors with many years of real life Big Data experience. Get to know with our team!

More information

Please contact us for any questions on training courses, or if you would like to discuss a custom, on-site training course.

FEEDBACK FROM ATTENDEES

  • Hadoop Administrator Training
    Hadoop Administrator Training, Allegro

    I do highly value substantive content of the course as well as great preparedness and layout. Knowlege passed in a ordered, consistent and effective way. Participants involvement during workshop sessions is the best indicator of this positive training!

  • Big Data Workshop
    Big Data Workshop, Stepstone

    Big Data workshops were led by real professionalists, tools and materials prepared in a way allowing participants to get down to the brass tacks straightaway without losing time. Attendees not disturbing each other and evryone can work comfortably and effectively. One can notice striking knowledge of the host and the fact that it comes from real professional work experience.

  • IE Business School

    This is an excellent course and excellent teacher. Adam was well prepared, new the subject material, was good at transmitting his knowledge to us and had prepared exercises that added a lot of value to the sessions. I would rank this six if I could.

  • Hadoop Developer Training
    Hadoop Developer Training, Conficential

    Professionally prepared and led courses. Coaches with vast experience in the presented realm.

  • IE Business School

    Outstanding professor, the course was very well planned, he is very knowledgeable about what he taught. He talked about real-world cases and managed to get the whole class interested for 6 hours straight. Definitely one of the best courses that we have had in the masters.

OTHER BIG DATA TRAINING

pattern
http://getindata.com/wp-content/themes/blake/
http://getindata.com//
#FFD966
style1
scrollauto
Loading posts...
/home/kawaa/domains/kawaa.linuxpl.info/public_html/gd2/
#
off
none
loading
#
Sort Gallery
http://getindata.com/wp-content/themes/blake
on
off
Enter your email here
on
off