Big Data Event
12 min read

A Review of the Presentations at the DataMass Gdańsk Summit 2022

The 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for the mass of knowledge and experience you shared with us.  Also, thanks to the participants for your support, networking and for contributing to the unique atmosphere of the conference. 

Let’s now look at the top trends that we observed from the presentations, what representation appeared on stage, find out about the three top rated presentations and finally, let's see the review and takeaways from a cross-section of the presentations.

data-mass-conference
Marek Wiewiórka on the Data Mass stage

Big Data & Cloud Top Trends

During the Summit we heard 23 presentations which were given by 28 speakers who came from all over the world: New York, New Orleans, Hamburg, Budapest and Warsaw. The main trends that were revealed by these presentations were:

  • Both scaleups & enterprises use the public cloud
  • The Modern Data Platform is a very fast way of starting analytics in the cloud
  • Open-source and cloud-agnostic technologies are used in all clouds
  • Advanced AI/ML use-cases are implemented by many companies

Probably the most important insight is that it’s amazing how quickly companies can actually build their solutions using the cloud.  With the current cloud offerings you iterate much faster than ever before. What's more, during this conference we saw many examples of real-world use of the cloud. 

Who were the speakers and where did their presentations come from?

data-mass-speakers-companies

Our speakers represent a large number of data-driven companies from all over the world.

We would like to thank the invited speakers for their contributions of experience and knowledge.

We would also like to thank the community, because almost half of the presentations came from the Call for Presentations process.

The top 3 best-rated presentations of DataMass 2022

Here are the top three listener-rated presentations:

  1. “Introduction to Causal Inference in the Ride-Hailing Business” by Alessandro Romano from FREE NOW
  2. "How to process 33bln events from set top boxes in under 4 minutes” by Grzegorz Gwoźdź from Vectra 

You can read more below about this presentation in Grzegorz Kołpuć’s review.

  1. “Data engineering at the scale of PepsiCo eCommerce, 3 years of experience” by Pierre de Leusse and Dmitry Ulanov from PepsiCo

top-rated-presentations-data-mass

Following technology trends and finding ways to deal with data processing

Presentations review by Maciej Maciejko, Staff Data Engineer at GetInData

The Datamass Gdańsk Summit attracted a lot of Big Data enthusiasts and specialists. As a Data Engineer, I  wouldn't have forgiven myself if I had missed that event. The conference was an opportunity to discover the leading trends in the brand and see how companies successfully deal with data processing.

The first presenter - Wouter de Bie, explained how Data Dog (a cloud monitoring service) infrastructure works in a multi-cloud environment during the lecture “Data Infrastructure in a Multi-Cloud environment”. The key to success is cloud agnostic technology such as Kubernetes. To simplify, the data is processed in one environment - which is defined as a cloud and region. Only the results are allowed to be exchanged between environments. Wouter emphasized the meaning of metadata which is crucial to determine the producer, consumer and understand the flow. Data Dog uses Luigi for task orchestration, Flink, Spark for ETL and Snowflake for ELT. 

data-flow-presentation
A slide from Wouter de Bie's presentation

Remarkably, the latest technology is getting more and more popular. During the presentation “Let’s build our own Cloud Data Platform” Łukasz Leszewski brought up the genealogy of Snowflake and explained why its architecture is better than traditional databases and how it simplifies working with ETL/ELT.

 During the presentation “Bank Analytics in the Cloud” displayed the transformation process from expensive and unscalable on-premise issues to a “full-cloud” solution. He explained how to measure the overall performance of an IT department using multiple criterions such as time-to-production, CI/CD, speed of access to data and model advancements etc. Based on this it was easy to discover the strengths and weaknesses. The next step was the optimization of the whole process of development and delivery. Kedro was a very important feature in this, which allowed Data Scientists to “talk in one language”. At the end, Łukasz pointed out that centralized architecture leads to bottlenecks which can be solved by Data Mesh.

The Data Mesh concept uses decentralized architecture which allows domain teams to perform cross-domain analytics on their own. This was explained by Szymon Homa during the presentation “Data Mesh concept, executed by Trino”. Data mesh is based on 4 principles: Domain ownership, Data as a product, a Self service Data Platform and Federated Governance. That’s the theory. The more practical side of the presentation focused on Trino. It’s a very fast growing, open-source, distributed query engine. It allows the integration of almost any data source and query using simple sql. Data mesh with Trino can be a powerful alternative for data lakes and ETLs for companies with integrated, multi-domain products.

Márton Balassi gave a great lecture titled “Running Apache Flink in any cloud environment”. He presented a cloud agnostic solution with Kubernetes (again!) as the orchestration layer with the new Flink Kubernetes Operator, which plays the scheduler role. At least state management is easy, consistent and supported by the Flink community!

Marcin Szeliga convinced me during his presentation “Medical Image Analysis using Auto ML” that ML can be simple, cheap and extremely useful. You don’t have to know how to create ML models or have specialist domain knowledge to start working with them to get great results.

I also have to mention other leading technologies such as Apache Airflow (cron based scheduler) or dbt (data transformation) which are very commonly used, which could be observed in the other presentations. 

The Datamass Gdańsk Summit was a great experience and I’m waiting for the next edition.

Data in the Cloud: migration to the Cloud & Cloud engineering

Presentations review by Grzegorz Kołpuć, Staff Data Engineer at Getindata

DataMass kicked off to a really strong start with Wouter de Bie, a former Spotify engineer who’s currently working for DataDog as Director of Engineering. In his presentation, Data Infrastructure in a Multi-Cloud environment he demonstrated the main focus of the industry at that time - Cloud. Wouter talked about how DataDog solves client-specific problems related to a variety of cloud vendors and geolocations/regions. Being cloud agnostic is definitely a trend on such platforms. DataDog developed an in-house solution (also based on Mortar acquisition) to manage the cluster lifecycle across multiple clouds. The shared experience taught a good lesson -  take care of a good abstraction layer as well as using cloud agnostic components to let you go multi-cloud.

The Cloud area has been explored further with Łukasz Hunka’s Bank Analytics in the Cloud presentation and Łukasz Leszewski ’s  Let’s build our own Cloud Data Platform. They presented a little bit more of a business oriented view, showing how to manage migration projects in a cost-effective manner. Snowflake as a modern data warehouse that was designed to be cloud native, and Lukasz Leszewski explained how engineering resource limits and infrastructure problems could be addressed out of the box. 

The conference definitely had a lot to offer to technical people. Grzegorz Gwoźdź demonstrated an interesting use case of tv/video on demand, which generates billions of real time events at Vectra - How to process 33bln events from set top boxes in under 4 minutes. Grzegorz is a well known Tricity geek, and as expected during the presentation, took a deep dive into the technical details of the data platform. ‘Keep it simple’ as he says, to share his lessons learned. Overcomplexity and covering every corner case imaginable may kill your development and explode your cloud bill. It was very interesting to see how GCP services were selected and what the crucial factors leading to the decision were. What I took out of that presentation was that pragmatism will help your project to succeed.

presentation-slide-big-data
A slide from Grzegorz Gwoźdź presentation

Also, our GetInData fellow Marek Wiewiórka, with his "From first contact to a full charge... How we built a Modern Data Platform in 4 months for a FinTech scale-up." was able to share his experience with building Data Platforms along with Daniel Owsianski and Daniel Tidström. The massive efforts made at GetInData to build a generic Modern Data Platform were utilized at Volt.io, where our team architected the solution end-to-end. Managed cloud services and internal R&D plugins were put together to provide a scalable and self-service environment for analytics engineers, data analysts and business users. Marek shared the story where the whole platform was deployed in just 4 months. That impressive timeframe was possible to achieve due to the ground work done previously at GetInData Labs. Volt.io is one of the first clients to benefit from it. A lot of what was in the presentation is covered in this blog post: How we built a Modern Data Platform in 4 months for Volt.io, a FinTech scale-up.

slide-big-data-presentation-modern-data-platform
A slide from presentation: From first contact to a full charge... How we built a Modern Data Platform in 4 months for a FinTech scale-up

Recently I have been involved in a huge cloud migration project. Marcin Kaptur’s presentation "Don’t go with the flow. How did Ringier Axel Springer moved its data to the cloud?" impressed me a lot. It was semi technical, semi business talk, which gave a comprehensive summary of decisions made at the right time to make the migration process smooth. The crucial part of the process was to select a cloud provider with a flexible environment which offers a variety of services to use. Migration is sometimes approached in a lift and shift manner, but Marcin explained their choice of the ‘re-architect’ method. It was discovered  that it was  a way of reducing development and operation costs and having a more efficient solution at the end. Cost-effectiveness is a challenge when moving to the cloud. I really liked the important thing pointed out during the talk - ‘Tell developers about money’. To achieve cost effectiveness, engineers need to know what the impact of their decisions is and be responsible for the actions they take during development. The financial impact of service usage and solution design is part of cloud engineering.

data-mass-speaker-conference
Grzegorz Gwoźdź on Data Mass stage

The Big Data Technology Warsaw Summit 2023!

As Maciej said, if you are a Data Engineer, being at such conferences is an opportunity to stay up to date with trends that you can't miss. However,  even if you haven't had the opportunity to attend DataMass yet, don't despair. Another opportunity to learn from case studies and solutions from the best in the Big Data field is fast approaching. The twin Big Data Technology Warsaw Summit 2023 conference is coming in the spring, and you can already submit presentations for this edition to stand on the stage of one of the largest Big Data events in Europe alongside the best. Submit your presentation here: Call For Presentation Big Data Technology Warsaw. Also follow the conference's profile on LinkedIn so you don't miss out on registration.

Want to know more about Big Data, Cloud, ML and AI?

Join our newsletter and do not miss anything!

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy
big data
cloud
MLOps
Big Data Conference
25 October 2022

Want more? Check our articles

dsc3210
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2022! Part 2. Top 3 best-rated presentations

The 8th edition of the Big Data Tech Summit left us wondering about the trends and changes in Big Data, which clearly resonated in many presentations…

Read more
getindata blog big data machine learning models tools comparation no text
Tutorial

Machine Learning model serving tools comparison - KServe, Seldon Core, BentoML

Intro Machine Learning is now used by thousands of businesses. Its ubiquity has helped to drive innovations that are increasingly difficult to predict…

Read more
apache2xobszar roboczy 1 4
Tutorial

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

We are  producing more and more geospatial data these days. Many companies struggle to analyze and process such data, and a lot of this data comes…

Read more
why do big data project fails
Tutorial

Why do Big Data projects fail: Part. 2. The Technological Issues.

In the previous post on our Big Data Blog, we discussed the business reasons behind the failures of Big Data projects. We've listed five major…

Read more
power of big dataobszar roboczy 1 3x 100
Tutorial

Power of the Big Data: Industry

Welcome to the third part of the "Power of Big Data" series, in which we describe how Big Data tools and solutions support the development of modern…

Read more
data pipelines dbt bigquery getindata
Tutorial

Up & Running: data pipeline with BigQuery and dbt

Nowadays, companies need to deal with the processing of data collected in the organization data lake. As a result, data pipelines are becoming more…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy