12 min read

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast thanks to a lot of innovation, competition, and use of technologies that become now critical to almost all companies on this planet. Let’s read about the 5 current trends that will be described in detail by selected presentations at the upcoming edition of Big Data Tech Warsaw 2021 (February 25-26th).

  1. MLOps becomes mainstream
  2. ML/AI becomes ubiquitous in our daily life
  3. Data Quality and Data Observability becomes easier
  4. Larger clouds over the Big Data landscape
  5. Best practices for managing Big Data teams and projects emerge.

Trend 1. Machine Learning Operations (MLOps) becomes mainstream

The issue of building machine learning systems, especially scalable ones, was presented by Google in a research paper in 2015 ("Hidden Technical Debt in Machine Learning Systems"). At that time, many companies were already in the process of creating large-scale ML systems. Significantly, however, few had a dedicated platform or tools that would support the end-to-end life-cycle of their ML models and the daily work of their ML teams.

Presentations at the Big Data Technology Warsaw Summit, dealing with the issue of Machine Learning Operations (MLops).

Last year we had a number of very interesting MLOps-related presentations at BDTWS 2020 given by speakers from companies such as Spotify, Disney+, Synerise. The mentioned companies were part of the Data Science & ML track last year.

Interest in the topic is growing constantly, so this year on BDTWS we have prepared a special track called MLOps. Below are examples of Machine Learning Operations presentations that can be seen at the Big Data Technology Warsaw Summit 2021:

  • Keven(Qi) Wang will talk about MLOps journey at H&M on the public cloud. In his speech he will present their entire MLOps stack that has been adopted by multiple product teams managing 100s of models across the entire H&M value chain. It enables data scientists to develop models in a highly interactive environment, enables engineers to manage large scale model training and model serving pipeline with full traceability.

  • Maciej Pieńkosz from Sotrender, a company whose main task is to analyze huge amounts of data coming from Social Media, will talk about their ML use-cases and GCP components they use (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD). His presentation will cover the full lifecycle of the ML model - from experimentation, through deployment and training, to model monitoring.

    Google Cloud Platform in GetInData
    GCP GetInData Blog

  • It's hard to operate in the IT industry (especially within Big Data projects implemented on open-source technologies) and not know the Australian company called Atlassian. Jiamei Du will talk about how her company uses A/B experiments to build better products. Part of her story will focus on their MLOps tools and infrastructure to make their A/B experiments as efficient as possible.

  • One cannot fail to mention the members of GetInData, who will present their experiences in building portable and reusable ML platforms in various environments (cloud, hybrid, on-premise) using a mix of open-source and cloud-based technologies for various customers. They will share their experience and best practices that come from multiple production implementations.

NoMagic robots improve iteratively and continuously thanks to the software 2.0 improvement cycle supported by an in-house data engine. Watch this short video below to see what type of robots they teach using ML/AI.

Those are only a few highlighted examples, but you will definitely learn more about Machine Learning Operations at Big Data Tech Warsaw 2021.

Trend 2. ML/AI becomes ubiquitous in our daily life

Adoption of Machine Learning, Data Science, and AI algorithms and techniques always required a lot of work, skills, and time Undoubtedly, however, when conducted successfully, it brings excellent results.. One of the favorite examples to mention is Discover Weekly implemented by Swedish, world-wide known company, Spotify. Below, you can see slides created by my ex-colleagues at Spotify. On those slides, they describe how Discover Weekly came to be, highlighting technical challenges, data-driven development, and the ML models used to power their recommendations engine. It was a complex process, not done overnight. Integrate all necessary (open-source) technologies, then build scalable architecture, implement smart algorithms and monitor it was undoubtedly a big undertaking, at least five years ago.

Today, building dedicated ML platforms and using MLOps toolkits can significantly increase companies productivity. Very often, they also switch to the public cloud - it helps to take advantage of ready-to-use libraries and hardware, and as a consequence, makes their job easier. These processes result in the possibility of experimenting, training and deploying new models faster and cheaper.

Clearly, more and more ML models appears in our daily life these days.

Machine Learning and AI in Big Data
Machine Learning GetInData

Machine Learning/Artificial Intelligence -related presentations you can watch at Big Data Tech Warsaw

During the BDTWS 2021 conference, you can count on many presentations that (a) describe use-cases, algorithms, and techniques which show how Machine Learning and Artificial Intelligence solve real-world business problems and (b) share their lessons learned from working with ML, Data Science, and advanced analytics. Let’s highlight a few interesting examples:

  • Mikio Braun (ex-Zalando) will talk about the lessons he learned on building large-scale production recommender systems. He will, among other things, explain how to bridge the gap from the raw mathematical models and algorithms to robust and scalable software systems. It will be exploring the union of theory and practice

  • Boxun Zhang (ex-Spotify, currently at Unity) will talk about similar issues in his presentation, although he will focus on the aspect related to real-time and large-scale Machine Learning systems. Boxun will also share several generalizable lessons that make ML systems performant from an ML perspective and scalable from an engineering perspective.

  • It's also hard not to mention GetInData members who will present their experiences from a year-long journey in developing Kcell (a large Kazach telecom’s) big data analytics platform and building data-driven solutions on top of it that help to reduce costs, improve the quality of the services and understand users' needs better.

  • Machine Learning is often used for prediction, forecasting, and anomaly detection. At the BDTWS 2021 we will be able to hear the story about a near real-time ML model built by Ericsson. It is used for predicting telecom systems degradation and outage based on historical fault & performance data. This model helps the operations team to conduct proactive monitoring, thanks to which the number of hours that support engineers spent on solving issues has significantly decreased. We are talking about a drop ranging between 30 and 40%. It also improved the UXin pre-paid calls and made customer retention higher. Peltarion (a Swedish company that specializes in AI) will describe their state-of-the-art weather forecasting AI service. Sotrender (a Polish company that analyses data from social media) will explain how they use ML to predict and monitor the effectiveness of campaigns conducted on the Facebook platform.

  • At Big Data Technology Warsaw Summit 2021 there will also be presentations on the use of data, science and technology to generate insights for search and recommendation systems in an e-commerce platform (Etsy), to build content personalization systems in e-commerce (eBay), run A/B experiments for growth (Atlassian), analyze geophysical data from ground-penetrating radars using deep-learning techniques (SGPR.TECH), and more.

Trend 3. Data Quality and Data Observability becomes easier

For data-driven company, things like data quality and observability have always been important, even a long time ago when tools like Hadoop and Hive were open-sourced. On the other hand, it was always problematic, due to the lack of simple-to-use and feature-rich technologies (especially the open-source ones ). For this reason, many companies haven’t addressed these problems correctly.

Data Quality and Data Observability in GetInData
Data Quality and Data Observability

Recently, however, the status quo has changed, and new tools have emerged that significantly facilitate data quality and data observability.  This includes various tools such as Apache Atlas, Amundsen from Lyft, Dataportal from AirBnB (see a picture below), Datahub from LinkedIn, Data Catalog from Google, and Deequ from Amazon to name a few. These tools are often integrated together - check how Amundsen can work together with Feast for machine learning discovery or Atlas for data discovery.

Big Data Tools used by GetInData in data-driven approach.
Data-driven approach

Data Quality and Data Observability at BDTWS 2021 

  • There will be a presentation on a new open-source technology called Marquezthat can be used for data lineage and observability.This new tool can help to understand how amounts of data are flowing through  company’s systems. Thanks to this, it will be possible to demonstrate the dependencies that occur between individual teams receiving and producing data, as well as easier to carry out data pipelines audit.

  • While ensuring that data quality is important even in the small data set, Criteo representatives will tell how they addressed data quality challenges on their 120+ PB data lake and thousands of jobs. .  Their journey began two years ago, and they will now share with us the data and thoughts they have collected. The picture below shows data lake anomaly detection at Criteo (source)

    Tools to measuring Data Quality in data-driven approach
    Data Quality and Observability at BDTWS 2021

  •  The presentation from OLX will concern pragmatic approach to data quality . It will focus on a  a review of already existing frameworks and approaches to data quality. Beside this,  it will include principles behind adapting these approaches and designing data quality systems at OLX. 

  •  It’s not all, as there will also be presentations about building testable data pipelines at Target and about a tool called Diftong from Klarna for validating big data workflows.

These are the first three trends in Big Data that will be strongly represented in presentations at the BDTWS 2021 conference. But that's not all, go to the next post to learn about the next two trends and learn a bit about the presentations that will apply to them!

big data
cloud computing
machine learning
open source
21 January 2021

Want more? Check our articles

getindata how start big data project

5 questions you need to answer before starting a big data project

For project managers, development teams and whole organizations, making the first step into the Big Data world might be a big challenge. In most cases…

Read more
dynamodb aws jedraszewski getindata big data blog

Amazon DynamoDB - single table design

DynamoDB is a fully-managed NoSQL key-value database which delivers single-digit performance at any scale. However, to achieve this kind of…

Read more
big data blog getindata data enrichment flink sql http connector

Data Enrichment in Flink SQL using HTTP Connector For Flink - Part One

HTTP Connector For Flink SQL  In our projects at GetInData, we work a lot on scaling out our client's data engineering capabilities by enabling more…

Read more
Big Data Event

A Review of the Presentations at the DataMass Gdańsk Summit 2022

The 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for…

Read more
data analyst data analytics how start career non technical background getindata big data blog

Data Analyst - how to start your career with a non-technical background

Interested in joining the data analytics world? Not sure where to start? Are more and more questions popping into your head? I’ve been there myself…

Read more
getindator create an image illustrating the concept of data ske b0d7e21f 9c85 40d2 9a52 32caba3aece3

Data skew in Flink SQL

Data processing in real-time has become crucial for businesses, and Apache Flink, with its powerful stream processing capabilities, is at the…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy