9 min read

5 main data-related trends to be covered at Big Data Tech Warsaw 2021 Part II

Trend 4. Larger clouds over the Big Data landscape

A decade ago, only a few companies ran their Big Data infrastructure and pipelines in the public cloud (Netflix was one of such companies). At that time, the most popular way to build Big Data solutions was to use on-premise infrastructure and an ecosystem of open-source components. In 2012-2013, we even had examples of companies that tried public cloud solutions, but quickly returned to building Big Data infrastructure with their own data-centres. The reason was primarily high costs, issues with elasticity, and service unavailability.

There were also very often opinions that public clouds and cloud infrastructure were too expensive, regardless of the cost calculation.

A clear change began in 2014 when Microsoft and Google began to compete with Amazon in the field of the public cloud. In my opinion, however, one of the biggest milestones for the development of public cloud-based infrastructures was convincing Spotify to move from their large on-premise & open-source data infrastructure to the public cloud. It was a sign for the Big Data community that using the public cloud brings with it significant opportunities, so large that companies like Spotify are willing to pay for them.

Public cloud architecture in Big Data — Spotify in public cloud

This trend has accelerated in recent years, also in the passing 2020. We see (at least in Poland) a significant adaptation of public cloud solutions in companies from various sector (e.g. banking or industry)

Public Cloud at Big Data Tech Warsaw

During the Big Data Technology Warsaw Summit 2021 conference, we will be able to listen to many presentations related to the use of public cloud Here are some interesting examples:

We will have the chance to listen about Outfit7 experiences. They develop highly-popular mobile apps (eg. My Talking Tom 2). The company collects and analyzes 3 TB of gaming events on an average per day all thanks to Google Cloud using Kubernetes, Dataflow, BigQuery, Cloud Composer, Jupyter, and Tableau. They will show, how their cloud-based architecture looks like, how they implement end-to-end real-time pipelines, and how their skilled team fights downtime by using proactive monitoring and integration tests. Last but not least, you’ll hear the story of the challenges that Outfit7 faced when the COVID-19 quarantine made the amount of data it had to handle skyrocket.

There will be more companies who will share their experiences of using Google Cloud. First, Sotrender will tell how they train and deploy their machine learning models using Google Cloud Platform (e.g. AI Platform Notebooks, AI Platform Training, Cloud Run, Gitlab CI/CD) covering the full lifecycle of ML model. Another company, TalentAlpha will explain in their presentation, how they analyze HR-related data with Google Cloud Platform. They can use it for skills analysis, assessment of specialists career guidance, psychometrics, recruitment and more. The largest Polish e-commerce platform will introduce BigFlow - an open-source Python framework for data processing on the Google Cloud Platform. There are aspects BigFlow is sharing with Scio (developed by Spotify).
For Those who are looking for information about public clouds other than Google will also not be disappointed. There will be speeches on building cloud infrastructure basing on AWS. For example, Simply Business (an online broker of business insurance) has created a customer data platform on top of AWS. They are using it for combining different data points from different services. Thanks to that data, they are able to score, personalize, and calculate critical business metrics. During their presentation, they will focus on describing their journey to implement stateful applications using Kafka Streams. They will also share the knowledge they gained from running such applications in production for 2 years.
Pay-as-you-go model always has pros and cons, we know it very well. In this model, you pay only for what you use, but without proper costs control, and without common optimization techniques, you can actually pay more than you really need. The same thing will happen if you don't use cloud services efficiently. Nowa Era, will describe their statistical models (ARIMA) & techniques for AWS Spot instances price prediction that help to achieve impressive cost optimization for Big Data infrastructure (up to 80% compared to on-demand instances). The presentation will be of particular interest to listeners who are looking for information on how to use the public cloud in a more cost-efficient way.

The last presentation I decided to bring you closer will be about the production use-cases built on top of Azure. As mentioned in earlier blogopost, H&M will describe their multi-year AI/ML journey in the public cloud (Azure, Databricks) and explain how their architecture has evolved over time. The story will cover the entire MLOps stack addressing a few common challenges in AI and Machine learning product, like development efficiency, end to end traceability, speed to production.

Trend 5. Best practices for managing Big Data teams and projects emerge

Don't we too often forget about one of the most important issues in building even the most complex projects? It's only architecture, technologies or other technical aspects. Probably everyone agrees that one of the critical success factors in Big Data projects (if not the most critical) is a team.

Data-driven team management in big data projects — Team Management in Big Data.

Still, a large percentage of Big Data projects fail, exceed the budget, or don't meet critical deadlines. It becomes crucial to study and measure how team management can increase chances for a project to be successful. There are some common patterns and best practices that, if properly defined, may help to avoid problems that lead to the failure of Big Data projects.

Big Data Teams presentations at Big Data Technology Warsaw Summit 2021

This year at the BDTWS 2021 conference, we will have various presentations that introduce Big Data projects from the perspective of team management often in a data-driven approach. These presentations are part of the "Data Strategy and ROI" track:

Multinational publishing and education company, Pearson, will share with us information about their recent projects - the implementation of an AI-based learning app: The main problems in the development and implementation of the project resulted from significant limitations, such as: short timeframe (6 months), fully remote work (10 time zones – from San Francisco to Moscow), rapidly growing number of teams and participants (it was up to 90 people, and that number include software engineers, ML researchers, UX designers, teachers, screenwriters, film editors), and many other dependencies (e.g. deliverables of teams are strongly dependent on one another). During the working speech they will describe both the technical and organizational challenges, they faced while building this complex AI-based app with a short time-to-market. They will also share their lessons learned on how to deal with the challenges listed above, and (and that's just as important) how not to do it.
My colleagues from GetInData team will talk about their experience in planning and executing Big Data initiatives in the organizations, focusing on working with good practices. Many of the projects we are working on are developed in a constantly changing environment (new requirements, stakeholders, or technologies) which requires a lot of flexibility, skills, and proper team management. This presentation will be particularly interesting for project managers and other people related to team management because it will be given by the project managers themselves.

Jesse Anderson author of "Data Teams: A Unified Management Model for Successful Data-Focused Teams", data engineer and trainer, will talk about the importance of a solid foundation for data teams. He will also identify common problems with it and explain what management should do to fix it. Jesse has several years of experience in studying the importance of data teams, and here are his slides from 2017 where he describes the five dysfunctions of a data engineering team.

The Five Dysfunctions of a Data Engineering Team from Jesse Anderson

As you might expect, data & AI can be also used to analyze teamwork! TalentAlpha’s presentation will explain how the company analyzes HR-related data on top of Google Cloud Platform. They use it for skills analysis, psychometrics, assessment of specialists, recruitment, career guidance, and more. This presentation can bring many benefits when building a strong and perfectly prepared team for their tasks. The data-driven approach also works well in management.

What’s next?

If you are interested in any of the presentations, we invite you to check our agenda and register before February 5th to take advantage of Winter Promotion (link).

As you might expect, this year, the conference will be organized in the form of an online interaction. Please check my recent blog post that explains how COVID-19 changes Big Data Tech Warsaw 2021 but makes it greater at the same time.

big data

technology

google cloud platform

bigdatatech

bigdatatechwarsaw

cloud

Last updated: 22 January 2021

Written by

Adam Kawa

CEO and Founder

Like this post?
Spread the word

Want more? Check our articles

flink kubernetes how why blog big data cloud

Tutorial

Flink on Kubernetes - how and why?

Flink is an open-source stream processing framework that supports both batch processing and data streaming programs. Streaming happens as data flows…

Tutorial

NiFi Ingestion Blog Series. PART II - We have deployed, but at what cost… - CI/CD of NiFi flow

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

Use-cases/Project

Fighting COVID-19 with Google Cloud - quarantine tracking system

Coronavirus is spreading through the world. At the moment of writing this post (on the 26th of March 2020) over 475k people have been infected and…

Radio DaTa Podcast

Data Journey with Yetunde Dada & Ivan Danov (QuantumBlack) – Kedro (an open-source MLOps framework) – introduction, benefits, use-cases, data & insights used for its development

In this episode of the RadioData Podcast, Adam Kawa talks with Yetunde Dada & Ivan Danov about QuantumBlack, Kedro, trends in the MLOps landscape e.g…

highly available airflow cluster aws notext

Tutorial

Highly available Airflow cluster in Amazon AWS

These days, companies getting into Big Data are granted to compose their set of technologies from a huge variety of available solutions. Even though…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

5 main data-related trends to be covered at Big Data Tech Warsaw 2021 Part II

Trend 4. Larger clouds over the Big Data landscape

Public Cloud at Big Data Tech Warsaw

Trend 5. Best practices for managing Big Data teams and projects emerge

Big Data Teams presentations at Big Data Technology Warsaw Summit 2021

Like this post?Spread the word

Want more? Check our articles

Flink on Kubernetes - how and why?

NiFi Ingestion Blog Series. PART II - We have deployed, but at what cost… - CI/CD of NiFi flow

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Fighting COVID-19 with Google Cloud - quarantine tracking system

Data Journey with Yetunde Dada & Ivan Danov (QuantumBlack) – Kedro (an open-source MLOps framework) – introduction, benefits, use-cases, data & insights used for its development

Highly available Airflow cluster in Amazon AWS

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!