12 min read

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast thanks to a lot of innovation, competition, and use of technologies that become now critical to almost all companies on this planet. Let’s read about the 5 current trends that will be described in detail by selected presentations at the upcoming edition of Big Data Tech Warsaw 2021 (February 25-26th).

  1. MLOps becomes mainstream
  2. ML/AI becomes ubiquitous in our daily life
  3. Data Quality and Data Observability becomes easier
  4. Larger clouds over the Big Data landscape
  5. Best practices for managing Big Data teams and projects emerge.

Trend 1. Machine Learning Operations (MLOps) becomes mainstream

The issue of building machine learning systems, especially scalable ones, was presented by Google in a research paper in 2015 ("Hidden Technical Debt in Machine Learning Systems"). At that time, many companies were already in the process of creating large-scale ML systems. Significantly, however, few had a dedicated platform or tools that would support the end-to-end life-cycle of their ML models and the daily work of their ML teams.

Machine Learning Operations (MLOps) GetInData
MLOps Scheme GetInData

Presentations at the Big Data Technology Warsaw Summit, dealing with the issue of Machine Learning Operations (MLops).

Last year we had a number of very interesting MLOps-related presentations at BDTWS 2020 given by speakers from companies such as Spotify, Disney+, Synerise. The mentioned companies were part of the Data Science & ML track last year.

Interest in the topic is growing constantly, so this year on BDTWS we have prepared a special track called MLOps. Below are examples of Machine Learning Operations presentations that can be seen at the Big Data Technology Warsaw Summit 2021:

  • Keven(Qi) Wang will talk about MLOps journey at H&M on the public cloud. In his speech he will present their entire MLOps stack that has been adopted by multiple product teams managing 100s of models across the entire H&M value chain. It enables data scientists to develop models in a highly interactive environment, enables engineers to manage large scale model training and model serving pipeline with full traceability.

  • Maciej Pieńkosz from Sotrender, a company whose main task is to analyze huge amounts of data coming from Social Media, will talk about their ML use-cases and GCP components they use (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD). His presentation will cover the full lifecycle of the ML model - from experimentation, through deployment and training, to model monitoring.

    Google Cloud Platform in GetInData
    GCP GetInData Blog

  • It's hard to operate in the IT industry (especially within Big Data projects implemented on open-source technologies) and not know the Australian company called Atlassian. Jiamei Du will talk about how her company uses A/B experiments to build better products. Part of her story will focus on their MLOps tools and infrastructure to make their A/B experiments as efficient as possible.

  • One cannot fail to mention the members of GetInData, who will present their experiences in building portable and reusable ML platforms in various environments (cloud, hybrid, on-premise) using a mix of open-source and cloud-based technologies for various customers. They will share their experience and best practices that come from multiple production implementations.

NoMagic robots improve iteratively and continuously thanks to the software 2.0 improvement cycle supported by an in-house data engine. Watch this short video below to see what type of robots they teach using ML/AI.

Those are only a few highlighted examples, but you will definitely learn more about Machine Learning Operations at Big Data Tech Warsaw 2021.

Trend 2. ML/AI becomes ubiquitous in our daily life

Adoption of Machine Learning, Data Science, and AI algorithms and techniques always required a lot of work, skills, and time Undoubtedly, however, when conducted successfully, it brings excellent results.. One of the favorite examples to mention is Discover Weekly implemented by Swedish, world-wide known company, Spotify. Below, you can see slides created by my ex-colleagues at Spotify. On those slides, they describe how Discover Weekly came to be, highlighting technical challenges, data-driven development, and the ML models used to power their recommendations engine. It was a complex process, not done overnight. Integrate all necessary (open-source) technologies, then build scalable architecture, implement smart algorithms and monitor it was undoubtedly a big undertaking, at least five years ago.

Today, building dedicated ML platforms and using MLOps toolkits can significantly increase companies productivity. Very often, they also switch to the public cloud - it helps to take advantage of ready-to-use libraries and hardware, and as a consequence, makes their job easier. These processes result in the possibility of experimenting, training and deploying new models faster and cheaper.

Clearly, more and more ML models appears in our daily life these days.

Machine Learning and AI in Big Data
Machine Learning GetInData

Machine Learning/Artificial Intelligence -related presentations you can watch at Big Data Tech Warsaw

During the BDTWS 2021 conference, you can count on many presentations that (a) describe use-cases, algorithms, and techniques which show how Machine Learning and Artificial Intelligence solve real-world business problems and (b) share their lessons learned from working with ML, Data Science, and advanced analytics. Let’s highlight a few interesting examples:

  • Mikio Braun (ex-Zalando) will talk about the lessons he learned on building large-scale production recommender systems. He will, among other things, explain how to bridge the gap from the raw mathematical models and algorithms to robust and scalable software systems. It will be exploring the union of theory and practice

  • Boxun Zhang (ex-Spotify, currently at Unity) will talk about similar issues in his presentation, although he will focus on the aspect related to real-time and large-scale Machine Learning systems. Boxun will also share several generalizable lessons that make ML systems performant from an ML perspective and scalable from an engineering perspective.

  • It's also hard not to mention GetInData members who will present their experiences from a year-long journey in developing Kcell (a large Kazach telecom’s) big data analytics platform and building data-driven solutions on top of it that help to reduce costs, improve the quality of the services and understand users' needs better.

  • Machine Learning is often used for prediction, forecasting, and anomaly detection. At the BDTWS 2021 we will be able to hear the story about a near real-time ML model built by Ericsson. It is used for predicting telecom systems degradation and outage based on historical fault & performance data. This model helps the operations team to conduct proactive monitoring, thanks to which the number of hours that support engineers spent on solving issues has significantly decreased. We are talking about a drop ranging between 30 and 40%. It also improved the UXin pre-paid calls and made customer retention higher. Peltarion (a Swedish company that specializes in AI) will describe their state-of-the-art weather forecasting AI service. Sotrender (a Polish company that analyses data from social media) will explain how they use ML to predict and monitor the effectiveness of campaigns conducted on the Facebook platform.

  • At Big Data Technology Warsaw Summit 2021 there will also be presentations on the use of data, science and technology to generate insights for search and recommendation systems in an e-commerce platform (Etsy), to build content personalization systems in e-commerce (eBay), run A/B experiments for growth (Atlassian), analyze geophysical data from ground-penetrating radars using deep-learning techniques (SGPR.TECH), and more.

Trend 3. Data Quality and Data Observability becomes easier

For data-driven company, things like data quality and observability have always been important, even a long time ago when tools like Hadoop and Hive were open-sourced. On the other hand, it was always problematic, due to the lack of simple-to-use and feature-rich technologies (especially the open-source ones ). For this reason, many companies haven’t addressed these problems correctly.

Data Quality and Data Observability in GetInData
Data Quality and Data Observability

Recently, however, the status quo has changed, and new tools have emerged that significantly facilitate data quality and data observability.  This includes various tools such as Apache Atlas, Amundsen from Lyft, Dataportal from AirBnB (see a picture below), Datahub from LinkedIn, Data Catalog from Google, and Deequ from Amazon to name a few. These tools are often integrated together - check how Amundsen can work together with Feast for machine learning discovery or Atlas for data discovery.

Big Data Tools used by GetInData in data-driven approach.
Data-driven approach

Data Quality and Data Observability at BDTWS 2021 

  • There will be a presentation on a new open-source technology called Marquezthat can be used for data lineage and observability.This new tool can help to understand how amounts of data are flowing through  company’s systems. Thanks to this, it will be possible to demonstrate the dependencies that occur between individual teams receiving and producing data, as well as easier to carry out data pipelines audit.

  • While ensuring that data quality is important even in the small data set, Criteo representatives will tell how they addressed data quality challenges on their 120+ PB data lake and thousands of jobs. .  Their journey began two years ago, and they will now share with us the data and thoughts they have collected. The picture below shows data lake anomaly detection at Criteo (source)

    Tools to measuring Data Quality in data-driven approach
    Data Quality and Observability at BDTWS 2021

  •  The presentation from OLX will concern pragmatic approach to data quality . It will focus on a  a review of already existing frameworks and approaches to data quality. Beside this,  it will include principles behind adapting these approaches and designing data quality systems at OLX. 

  •  It’s not all, as there will also be presentations about building testable data pipelines at Target and about a tool called Diftong from Klarna for validating big data workflows.

These are the first three trends in Big Data that will be strongly represented in presentations at the BDTWS 2021 conference. But that's not all, go to the next post to learn about the next two trends and learn a bit about the presentations that will apply to them!

big data
cloud computing
bigdatatechwarsaw
machine learning
cloud
open source
21 January 2021

Want more? Check our articles

18nX38qlhR2rMM2cQzZ0U3A
Use-cases/Project

How to build Digital Marketing Platform making the best out of Google Cloud

Nowadays digital marketing is a competitive business and it’s easy to tell that we are way past the point when a catchy slogan or shiny banner would…

Read more
getindata 1000 followers

5 reasons to follow us on Linkedin. Celebrating 1,000 followers on our profile!

We are excited to announce that we recently hit the 1,000+ followers on our profile on Linkedin. We would like to send a special THANK YOU :) to…

Read more
getindata running machine learning platform pipelines kedro kubeflow airflow mariusz strzelecki
Tutorial

Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow

One of the biggest challenges of today’s Machine Learning world is the lack of standardization when it comes to models training. We all know that data…

Read more
kubeflow pipelines runing 5 minutes getindata blog

Kubeflow Pipelines up and running in 5 minutes

The Kubeflow Pipelines project has been growing in popularity in recent years. It's getting more prominent due to its capabilities - you can…

Read more
why do big data project fails
Tutorial

Why do Big Data projects fail: Part. 2. The Technological Issues.

In the previous post on our Big Data Blog, we discussed the business reasons behind the failures of Big Data projects. We've listed five major…

Read more
getindata success story izettle stream processing
Success Stories

Success Story: Fintech data platform gets a boost from stream processing

A partnership between iZettle and GetInData originated in the form of a two-day workshop focused on analyzing iZettle’s needs and exploring multiple…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

By submitting this form, you agree to our  Terms & Conditions