12 min read

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast thanks to a lot of innovation, competition, and use of technologies that become now critical to almost all companies on this planet. Let’s read about the 5 current trends that will be described in detail by selected presentations at the upcoming edition of Big Data Tech Warsaw 2021 (February 25-26th).

MLOps becomes mainstream
ML/AI becomes ubiquitous in our daily life
Data Quality and Data Observability becomes easier
Larger clouds over the Big Data landscape
Best practices for managing Big Data teams and projects emerge.

Trend 1. Machine Learning Operations (MLOps) becomes mainstream

The issue of building machine learning systems, especially scalable ones, was presented by Google in a research paper in 2015 ("Hidden Technical Debt in Machine Learning Systems"). At that time, many companies were already in the process of creating large-scale ML systems. Significantly, however, few had a dedicated platform or tools that would support the end-to-end life-cycle of their ML models and the daily work of their ML teams.

Presentations at the Big Data Technology Warsaw Summit, dealing with the issue of Machine Learning Operations (MLops).

Last year we had a number of very interesting MLOps-related presentations at BDTWS 2020 given by speakers from companies such as Spotify, Disney+, Synerise. The mentioned companies were part of the Data Science & ML track last year.

Feature store: Solving anti-patterns in ML-systems from Andrzej Michałowski

Interest in the topic is growing constantly, so this year on BDTWS we have prepared a special track called MLOps. Below are examples of Machine Learning Operations presentations that can be seen at the Big Data Technology Warsaw Summit 2021:

Keven(Qi) Wang will talk about MLOps journey at H&M on the public cloud. In his speech he will present their entire MLOps stack that has been adopted by multiple product teams managing 100s of models across the entire H&M value chain. It enables data scientists to develop models in a highly interactive environment, enables engineers to manage large scale model training and model serving pipeline with full traceability.
Maciej Pieńkosz from Sotrender, a company whose main task is to analyze huge amounts of data coming from Social Media, will talk about their ML use-cases and GCP components they use (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD). His presentation will cover the full lifecycle of the ML model - from experimentation, through deployment and training, to model monitoring.

GCP GetInData Blog
It's hard to operate in the IT industry (especially within Big Data projects implemented on open-source technologies) and not know the Australian company called Atlassian. Jiamei Du will talk about how her company uses A/B experiments to build better products. Part of her story will focus on their MLOps tools and infrastructure to make their A/B experiments as efficient as possible.
One cannot fail to mention the members of GetInData, who will present their experiences in building portable and reusable ML platforms in various environments (cloud, hybrid, on-premise) using a mix of open-source and cloud-based technologies for various customers. They will share their experience and best practices that come from multiple production implementations.

NoMagic robots improve iteratively and continuously thanks to the software 2.0 improvement cycle supported by an in-house data engine. Watch this short video below to see what type of robots they teach using ML/AI.

Those are only a few highlighted examples, but you will definitely learn more about Machine Learning Operations at Big Data Tech Warsaw 2021.

Trend 2. ML/AI becomes ubiquitous in our daily life

Adoption of Machine Learning, Data Science, and AI algorithms and techniques always required a lot of work, skills, and time Undoubtedly, however, when conducted successfully, it brings excellent results.. One of the favorite examples to mention is Discover Weekly implemented by Swedish, world-wide known company, Spotify. Below, you can see slides created by my ex-colleagues at Spotify. On those slides, they describe how Discover Weekly came to be, highlighting technical challenges, data-driven development, and the ML models used to power their recommendations engine. It was a complex process, not done overnight. Integrate all necessary (open-source) technologies, then build scalable architecture, implement smart algorithms and monitor it was undoubtedly a big undertaking, at least five years ago.

From Idea to Execution: Spotify's Discover Weekly from Chris Johnson

Today, building dedicated ML platforms and using MLOps toolkits can significantly increase companies productivity. Very often, they also switch to the public cloud - it helps to take advantage of ready-to-use libraries and hardware, and as a consequence, makes their job easier. These processes result in the possibility of experimenting, training and deploying new models faster and cheaper.

Clearly, more and more ML models appears in our daily life these days.

Machine Learning and AI in Big Data — Machine Learning GetInData

Machine Learning/Artificial Intelligence -related presentations you can watch at Big Data Tech Warsaw

During the BDTWS 2021 conference, you can count on many presentations that (a) describe use-cases, algorithms, and techniques which show how Machine Learning and Artificial Intelligence solve real-world business problems and (b) share their lessons learned from working with ML, Data Science, and advanced analytics. Let’s highlight a few interesting examples:

Mikio Braun (ex-Zalando) will talk about the lessons he learned on building large-scale production recommender systems. He will, among other things, explain how to bridge the gap from the raw mathematical models and algorithms to robust and scalable software systems. It will be exploring the union of theory and practice
Boxun Zhang (ex-Spotify, currently at Unity) will talk about similar issues in his presentation, although he will focus on the aspect related to real-time and large-scale Machine Learning systems. Boxun will also share several generalizable lessons that make ML systems performant from an ML perspective and scalable from an engineering perspective.
Data Science Lessons I have learned in 5 years - Boxun Zhang, GoEuro from Evention
It's also hard not to mention GetInData members who will present their experiences from a year-long journey in developing Kcell (a large Kazach telecom’s) big data analytics platform and building data-driven solutions on top of it that help to reduce costs, improve the quality of the services and understand users' needs better.
Machine Learning is often used for prediction, forecasting, and anomaly detection. At the BDTWS 2021 we will be able to hear the story about a near real-time ML model built by Ericsson. It is used for predicting telecom systems degradation and outage based on historical fault & performance data. This model helps the operations team to conduct proactive monitoring, thanks to which the number of hours that support engineers spent on solving issues has significantly decreased. We are talking about a drop ranging between 30 and 40%. It also improved the UXin pre-paid calls and made customer retention higher. Peltarion (a Swedish company that specializes in AI) will describe their state-of-the-art weather forecasting AI service. Sotrender (a Polish company that analyses data from social media) will explain how they use ML to predict and monitor the effectiveness of campaigns conducted on the Facebook platform.
At Big Data Technology Warsaw Summit 2021 there will also be presentations on the use of data, science and technology to generate insights for search and recommendation systems in an e-commerce platform (Etsy), to build content personalization systems in e-commerce (eBay), run A/B experiments for growth (Atlassian), analyze geophysical data from ground-penetrating radars using deep-learning techniques (SGPR.TECH), and more.

Trend 3. Data Quality and Data Observability becomes easier

For data-driven company, things like data quality and observability have always been important, even a long time ago when tools like Hadoop and Hive were open-sourced. On the other hand, it was always problematic, due to the lack of simple-to-use and feature-rich technologies (especially the open-source ones ). For this reason, many companies haven’t addressed these problems correctly.

Data Quality and Data Observability in GetInData — Data Quality and Data Observability

Recently, however, the status quo has changed, and new tools have emerged that significantly facilitate data quality and data observability. This includes various tools such as Apache Atlas, Amundsen from Lyft, Dataportal from AirBnB (see a picture below), Datahub from LinkedIn, Data Catalog from Google, and Deequ from Amazon to name a few. These tools are often integrated together - check how Amundsen can work together with Feast for machine learning discovery or Atlas for data discovery.

Big Data Tools used by GetInData in data-driven approach. — Data-driven approach

Data Quality and Data Observability at BDTWS 2021

There will be a presentation on a new open-source technology called Marquezthat can be used for data lineage and observability.This new tool can help to understand how amounts of data are flowing through company’s systems. Thanks to this, it will be possible to demonstrate the dependencies that occur between individual teams receiving and producing data, as well as easier to carry out data pipelines audit.
While ensuring that data quality is important even in the small data set, Criteo representatives will tell how they addressed data quality challenges on their 120+ PB data lake and thousands of jobs. . Their journey began two years ago, and they will now share with us the data and thoughts they have collected. The picture below shows data lake anomaly detection at Criteo (source)

Data Quality and Observability at BDTWS 2021
The presentation from OLX will concern pragmatic approach to data quality . It will focus on a a review of already existing frameworks and approaches to data quality. Beside this, it will include principles behind adapting these approaches and designing data quality systems at OLX.
It’s not all, as there will also be presentations about building testable data pipelines at Target and about a tool called Diftong from Klarna for validating big data workflows.

These are the first three trends in Big Data that will be strongly represented in presentations at the BDTWS 2021 conference. But that's not all, go to the next post to learn about the next two trends and learn a bit about the presentations that will apply to them!

big data

cloud computing

bigdatatechwarsaw

machine learning

cloud

open source

Last updated: 21 January 2021

Written by

Adam Kawa

CEO and Founder

Like this post?
Spread the word

Want more? Check our articles

Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

Tutorial

Feature Store comparison: 4 Feature Stores - explained and compared

In this blog post, we will simply and clearly demonstrate the difference between 4 popular feature stores: Vertex AI Feature Store, FEAST, AWS…

Tutorial

Flink with a metadata catalog

Have you worked with Flink SQL or Flink Table API? Do you find it frustrating to manage sources and sinks across different projects or repositories…

Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Big Data Event

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Big Data Technology Warsaw Summit 2020 is fast approaching. This will be 6th edition of the conference that is jointly organised by Evention and…

Tutorial

Cloud computing standard for the insurance industry

On June 16, 2021, the Polish Insurance Association published the Cloud computing standard for the insurance industry. It is a set of rules for the…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

Trend 1. Machine Learning Operations (MLOps) becomes mainstream

Presentations at the Big Data Technology Warsaw Summit, dealing with the issue of Machine Learning Operations (MLops).

Interest in the topic is growing constantly, so this year on BDTWS we have prepared a special track called MLOps. Below are examples of Machine Learning Operations presentations that can be seen at the Big Data Technology Warsaw Summit 2021:

Trend 2. ML/AI becomes ubiquitous in our daily life

Machine Learning/Artificial Intelligence -related presentations you can watch at Big Data Tech Warsaw

Trend 3. Data Quality and Data Observability becomes easier

Data Quality and Data Observability at BDTWS 2021

Like this post?Spread the word

Want more? Check our articles

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Feature Store comparison: 4 Feature Stores - explained and compared

Flink with a metadata catalog

Big Data Tech Warsaw Summit 2019 summary

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Cloud computing standard for the insurance industry

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!