Analytics engineering with Snowflake and dbt
From raw data to insights - learn all you need to build and deploy data transformation workflows with dbt and Snowflake.
With dbt you can transform your data in Snowflake simply with the power of SQL select statements. If you support it with deployment automations and engineering best practices, you’ll get a production ready data value chain for your data products. Data modeling strategies, automated testing, metadata management and performance optimization are only a few of the topics that will be covered.
Data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy Snowlake data transformation workflows faster than ever before.
- SQL: ability to write data transforming queries
- Basic understanding of Python (for exercises with Airflow and Snowpark Python)
- Basic knowledge on ELT / ETL processes
- Basic experience with a command-line interface
This training IS for you if…
- You (or your team) are using or about to start using dbt to build data pipelines on Snowflake
- You are looking for an optimal configuration for Snowflake-based dbt projects
- You know concepts of dbt, but you don’t know how to scale & productionize your data pipelines
- You want ownership when it comes to working with your Snowflake data.
This training is NOT for you if…
- You need to transform data outside of Snowflake
- You have no intention of ever working with dbt
- You have no experience with SQL
After the training, you’ll be able to
- Ingest data into Snowflake using modern data integration tools
- Build end-to-end data transformation models for your Snowflake-based data
- Apply dbt to configure and build Snowflake data pipelines in a modern way: choose Snowflake-specific table materialization options, manage job dependencies, data quality issues, generate and view documentation as you develop
- Leverage Snowpark Python integration with dbt's Python models for non-SQL workflows
- Schedule data pipelines with Apache Airflow
Course Topic Details
dbt & Snowflake fundamentals
- Understanding Snowflake - core concepts
- Build models to shape your data from raw to transformed
- Configure and run tests on your data to meet your expectations
- Write, generate, and view documentation as you develop
deploying dbt pipelines
- Data ingestion to Snowflake with modern data integration tools (e.g. Fivetran, Airbyte, Matillion)
- Deployment & orchestration of dbt project with Apache Airflow
- Integration between Snowpark Python and dbt's Python models
- Metadata management & integration with data catalog (e.g. Acryl Data)
Day 1 - Session #1 - Introduction to Snowflake & dbt
- What is Snowflake? Intro
- Key components of Snowflake ecosystem
- Core concepts of dbt
- Data models
- Seeds, sources
- Jinja and Macros
- Hands-on exercises
Day 1 - Session #2 - Simple end-do-end data pipeline
- Data discovery (data search, usage statistics, data lineage)
- Data profiling & exploration
- Transforming data using SQL with dbt
- Hands-on exercises
Day 2 - Session #3 - Data pipeline - ingestion, scheduling & deployment
- Data ingestion with modern data integration tools
- Apache Airflow as a workflow scheduler
- Data quality testing with dbt test
- Hands-on exercises
Day 2 - Session #4 - Data pipeline - advanced models & data observability
- Snowpark with dbt Python models
- Orchestrating Snowpark with Apache Airflow
- Anomaly detection & data observability (using e.g. Elementary, Soda)
- Hands-on exercises
Completed in half the estimated time and with a fivefold improvement on data collection goals, the robust product has exponentially increased processing capabilities. GetInData’s in-depth engagement, reliability, and broad industry knowledge enabled seamless project execution and implementation.
GetInData had been supporting us in building production Big Data infrastructure and implementing real-time applications that process large streams of data. In light of our successful cooperation with GetInData, their unique experience and the quality of work delivered, we recommend the company as a Big Data vendor.
GetInData delivered a robust mechanism that met our requirements. Their involvement allowed us to add a feature to our product, despite not having the required developer capacity in-house.
Their consistent communication and responsiveness enabled GetInData to drive the project forward. They possess comprehensive knowledge of the relevant technologies and have an intuitive understanding of business needs and requirements. Customers can expect a partner that is open to feedback.
We sincerely recommend GetInData as a Big Data training provider! The trainer is a very experienced practitioner and he gave us a lot of tips regarding production deployments, possible issues as well as good practices that are invaluable for a Hadoop administrator.
The engineers and administrators at GetInData are world-class experts. They have proven experience in many open-source technologies such as Hadoop, Spark, Kafka and Flink for implementing batch and real-time pipelines.
Other Big Data Training
Machine Learning Operations Training (MLOps)This four-day course will teach you how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.
Hadoop Administrator TrainingThis four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.
Advanced Spark TrainingThis 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.
Data Analyst TrainingThis four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.
Real-Time Stream ProcessingThis two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.
Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud
Modern Data Pipelines with DBTIn this one day workshop, you will learn how to create modern data transformation pipelines managed by DBT. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team.
Real-time analytics with Snowflake and dbt
Interested in our solutions?
Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?