Big Data Event
6 min read

Five big ideas to learn at Big Data Tech Warsaw 2020

Hello again in 2020. It’s a new year and the new, 6th edition of Big Data Tech Warsaw is coming soon! Save the date: 27th of February. We have put great effort into gathering our A-team of Big Data experts from top-tier global corporations and open-source companies, willing to share some of their vast knowledge on the latest achievements and new trends in the Big Data industry. Take a look at the highlights of the most interesting pieces we’ve arranged for you this year.

Big Data Technology Warsaw Summit

1. Ways to make large-scale ML actually work

This will be one of the hottest topics on our agenda. For example, Josh Baer will talk about their winding road to better ML infrastructure at Spotify, to make the lives of internal ML practitioners easier and more productive. As you might expect, even companies that have been using ML in their products for many years and have cutting edge ML capabilities, are continuously figuring out how to scale and operate these systems and the associated teams of software engineers involved.

This topic will be also discussed by several guest experts during our main conference panel, as well as roundtable discussions.

2. Building large-scale (real-time) data analytics platforms

Many ML models require fast access to data (e.g. to detect where the nearest taxi driver is or provide personalized product recommendations while browsing a website). This is where real-time data analytics platforms come in.

Reza Shiftehfa will share his thoughts on creating a Big Data platform at Uber to handle hundreds of petabytes with real-time access. The good news is that this platform was built with a mix of open-source technology (e.g. Hadoop, Spark, Hive, Presto, Kafka) as well as tech developed internally at Uber and later on open-sourced such as Hudi and Marmaray. This means that you can also use similar technology and techniques at your own company.

We’ll also host Yuan Jiang, who will describe interactive analytics at Alibaba. Yuan will mainly focus on their large-scale real-time data warehouse. This solution is based on Apache Flink (open-source) and is adopted internally by Search, Recommendation, and Ads products.

It’s worth noting that Apache Flink is also utilized by Humn.AI, a company that offers innovative car insurance calculated by real-time algorithms. Wojciech Indyk will present how their system is built, what the advantages and limitations of Apache Flink are, and how it can be used for use-cases such as detection of a car trip (in real-time), that might look trivial at glance, but expose some traps.

3. Using data and ML to build personalized products

Building advanced ML and real-time data platforms is only one side of the coin. The second one is actually the ability to use this data in a meaningful way.

Disney+ is a brand new streaming service (launched in November 2019) with an impressive subscriber growth. Some of you might not yet know that a team in Warsaw is working on its recommendation system. Grzegorz Puchawski, who is Head of Data Science and Recommendation at Disney Streaming Services, will share problems and lessons learned, that his team have dealt with whilst working on the recommender system, which provides personalized recommendations for ESPN+ and Disney+.

Personalization will be also covered by Tomasz Burzyński and Mateusz Krawczyk who will talk about how they personalize user experience for millions of their customers, using over 20 contact channels at Orange.

4. Migrating from on-premise to the public cloud

While many companies continue building and expanding their on-premise data platforms, we have started seeing more and more companies building hybrid platforms or moving over fully to the Cloud.

You will hear about the exciting journey from our own on-premise to the cloud done together by Truecaller and GetInData. Juliana Araujo, Fouad Alsayadi and Tomasz Żukowski will share with us some exciting tech choices they made, in order to build a robust architecture, lower costs and make their data scientists happier by migrating to the Google Cloud Platform (in a series of a few steps) using a mix of on-prem, hybrid and native cloud technology. Their scale is 150+M active users that generate 30B events a day.

For those who have been using the public cloud for a while, the presentation from Adam Kurowski and Kamil Szkoda (both StepStone) can be extremely useful. They will talk about best DevOps practices in the AWS cloud and will focus on three topics: Distributed data processing, costs optimization and security.

5. Organizing and discovering data in large data lakes

Everything that we are going to talk about during the conference wouldn’t be possible without … data. You can find many analogies between data and oil or gold. Similarly to oil and gold extraction, you also need to have efficient tools to find data. This will be the topic of a joint presentation given by ING and GetInData. Verdan Mahmood and Marek Wiewiórka will talk about how they are building an enterprise-grade data discovery and data lineage at ING. Thanks to this, ING’s data scientists can easily discover available datasets in their large data lake and trust them, thanks to powerful features such as data lineage, data quality and data profiling.

6. BONUS

In this article we’ve only mentioned about 9 presentations, while the agenda includes…33! This means that there are a lot of other useful topics that you will learn about by attending the conference (February 27th, 2020) and you can find them here.

Still not convinced? Watch the video relation from previous edition:


This article was jointly written by Adam Kawa and Mikołaj Wiśniewski.
big data
analytics
conference
Warsaw
technology
bigdatatech
bigdatatechwarsaw
getindata
machine learning
13 February 2020

Want more? Check our articles

getindata big data tech main 1
Big Data Event

A Review of the Presentations at the Big Data Technology Warsaw Summit 2022!

The 8th edition of the Big Data Tech Summit is already over, and we would like to thank all of the attendees for joining us this year. It was a real…

Read more
run your first private llm on gcpobszar roboczy 1 4
Tutorial

Run your first, private Large Language Model (LLM) on Google Cloud Platform

What are Large Language Models (LLMs)? You want to build a private LLM-based assistant to generate the financial report summary. Although Large…

Read more
1712737211456
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2024! Part 1: Takeaways from Spotify, Dropbox, Ververica, Hellofresh and Agile Lab

It was epic, the 10th edition of the Big Data Tech Warsaw Summit - one of the most tech oriented data conferences in this field. Attending the Big…

Read more
kedro snowflake getindata
Tutorial

From 0 to MLOps with ❄️ Snowflake Data Cloud in 3 steps with the Kedro-Snowflake plugin

MLOps on Snowflake Data Cloud MLOps is an ever-evolving field, and with the selection of managed and cloud-native machine learning services expanding…

Read more
picconference2
Big Data Event

A Review of the Presentations at the Big Data Technology Warsaw Summit 2023

It has been almost a month since the 9th edition of the Big Data Technology Warsaw Summit. We were thrilled to have the opportunity to organize an…

Read more
1wersjaobszar roboczy 1 4
Tutorial

Feature Store comparison: 4 Feature Stores - explained and compared

In this blog post, we will simply and clearly demonstrate the difference between 4 popular feature stores: Vertex AI Feature Store, FEAST, AWS…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy