16 min read

GetInData in 2022 - achievements and challenges in Big Data world

Time flies extremely fast and we are ready to summarize our achievements in 2022. Last year we continued our previous knowledge-sharing actions and launched new ones. Let’s not waste the time, dive into this summary and see what we did at GetInData last year!

  • Published 44 blog posts about Big Data, ML/AI, streaming, cloud, modern data platform, events and more.
  • Shared a lot of content about Machine Learning, Cloud and Artificial Intelligence. On our social media we promoted events and conferences. We also regularly posted Tech Facts with news from the Big Data world. Last but not least, there was plenty of content about life at GetInData. Find it on our social media channels that you can follow here.
  • Started new formats: the Radio Data podcast and the Data Pill newsletter that you can join, to create a data community with us.
  • Started Paper Talks - internal meetings, which in the end became public and now we are building a community in this area.
  • Organized 2 Big Data conferences in Poland, the Big Data Technology Warsaw 2022 and the Data Mass Gdańsk Summit 2022, also taking part in Big Data conferences around the world.
  • Continued our tradition and met more than 10 times for our internal Lunch & Learn sessions.
  • Guilds and Labs are constantly growing. We created 4 areas that are focused on DevOps, Data Engineering, MLOps and Streaming.

If you want to know more about our achievements, below you can find a list of some of them. Enjoy the read!

Contributions

2022 was full of GetInData contributions to open-source.   

During the whole of last year, we  presented many solutions in the creation of which our big data experts participated, such as:

  • Our DevOps Labs Team - Jakub Igła, Dominik Gniewek-Węgrzyn, Mariusz Wojakowski and Piotr Mossakowski, developed the Terraform module for Atlantis, which was highly acknowledged by the company and is now officially recommended as a way of installing Atlantis on Azure. Check it out here.
  • Krzysztof Chmielewski put tremendous work into the release of Delta Connectors 0.6.0, which supports the Flink/Delta Connector on Apache Flink™ 1.15.3.
  • Andrzej Dackiewicz recently worked on a new source-primetric connector for Airbyte.
  • Mariusz Strzelecki had a hand in Apache Spark and Airflow.
  • For months, the GetInData team including Maciej Obuchowski, Paweł Leszczyński, Jakub Dardziński and Tomasz Nazarewicz have been developing the OpenLineage project. We helped shape how Microsoft designed and implemented contributions to support Microsoft data sources and integrate with Azure Databricks. Also, our recent contribution supporting column level lineage has been the most anticipated feature for Microsoft. You can find an article written by Microsoft about the results of their work and our contributions here.
  • GetInData credit in Delta Lake 2.0.0. by Grzegorz Kołakowski. This release's most exciting change from our point is Change Data Feed. Especially when we are able to implement it in Flink in Streaming.
  • Apache Flink Source Connector for Delta Lake tables by Krzysztof Chmielewski.
  • Contribution to Terraform provider for Snowflake and another one by Marek Wiewiórka 

Blog posts

top-blogs-2022

During 2022, we were constantly posting on our blog. This means that one year later you can find here 44 published blog posts about big data, cloud, machine learning and more here. The top 5 most read are:

That’s obviously not all. We published posts about new technologies like

We also shared our knowledge in the field of Machine Learning (ML) and MLOps 

Also, you could read more about our solution such as the GetInData Modern Data Platform

Customer Stories

We also shared our success stories of working with clients with you:

We also started to share our content on Medium. Click if you want to follow us here

Webinars and Videos 

In 2022 we organized two live webinars. Were you not with us that day? That's not a problem at all! You can watch two of them here:

  • Building ML pipelines with Kedro and Vertex AI on Google Cloud Platform where Michał Bryś demonstrated the way to operationalize Machine Learning models using open-source tools, like Kedro and deploy them using cloud computing.
  • Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz, who explained GetInData’s Data-Driven Fast Track, the 3-step framework for data transformation. In this one you can learn how to assess how data-driven your company is, how to generate ideas for new initiatives to push your company towards better decisions and how to think about implementing these initiatives to increase your chances of success.

On our YouTube channel you can also find videos with our experts:

Radio DaTa Podcast

radio-data-journey-bigdata

We are also happy to share with you another project we started in 2022 - a Radio Data podcast! At Radio Data we talk about data, cloud, analytics and AI/ML/BI with different guest experts and different hosts in different segment formats. We have already started two segments:

The plan for the next year is to develop the existing formats and create new ones, so if you want to stay up to date, follow Radio DaTa on Spotify.

E-book

ebook-mlops-2022

In 2022 we also released our eBook “MLOps: Power Up Machine Learning Process. Build Feature Stores Faster”.

What will you find there?

  • How to eliminate the risk of the ineffective use of data in Machine Learning 
  • How to reach the full potential of data-driven decision-making in real-time 
  • A step-by-step guide to building a well functioning Feature Store 
  • What MLOps is and the MLOps platform   

This eBook is divided into two parts. First from a business perspective of MLOps. Explaining the terms and dependencies necessary to making decisions in a business context like what the MLOps Platform is and whether you need it or not. The second one has a technical perspective with the advanced technical content necessary to implement the eBook knowledge.

Download the eBook for free.

Our Big Data Experts at conferences and meetups

big-data-conferences-2022

Last year we organized two conferences: the Big Data Tech Warsaw Summit and DataMass Gdańsk Summit.

The 8th edition of the Big Data Technology Warsaw Summit was both on-site and online. If you weren't there you can still read the review of presentations and review of top 3 presentations, which will help you to decide to join us this year on the 29-30th March 2023!

There we had the pleasure of presenting:

  • Bartosz Chodnicki and Linkier Seixas talked about the Benefits of a Homemade ML Platform.
  • Mariusz Zaręba hosted a presentation called Let your analysts build data pipelines on Modern Data Platform using SQL.
  • NetWorkS! project - real-time analytics that controls 50% of mobile networks in Poland - our Big Data Lead - our colleagues Maciej Bryński and Michał Maździarz from NetWorkS! described how we manage Flink jobs at scale using Ververica and Kubernetes, how we monitor the platform using Clickhouse and what problems we need to overcome in the project.

At the DataMass Gdańsk Summit, two presentations were given by our experts:

  • Marek Wiewiórka gave a presentation named From first contact to a full charge... How we built a Modern Data Platform in 4 months for a FinTech scale-up.
  • Also Adrian Dembek and Piotr Chaberski talked about From a Machine Learning competition to an enterprise analytics framework.

That's not all! Our experts had the pleasure of performing in other interesting Big Data Events, such as:

  • During the Airflow Summit 2022, Maciej Obuchowski and Paweł Leszczyński gave a presentation entitled OpenLineage & Airflow - data lineage has never been easier.

  • We were also at the Data Science Summit ML Edition 2022

    •  Mariusz Strzelecki talked about 7 Jupyter architectures for 7 different organizations
    • Adrian Dembek and Piotr Chaberski presented How NOT to win a Kaggle competition.
  • During the Data Science Summit 2022 our experts gave few presentations:

    • Michał Rudko talked about Data Platform - what does it take to be called a modern one? A new stack with well-known best practices.
    • Piotr Menclewicz gave his presentation Data-driven fast-track - 3 steps to make your company data-driven.
    • Piotr Chaberski presented Prove your concept - faster, better, smarter.
    • Michał Stawikowski talked about Graph Neural Networks in Modern Recommendation Systems.
  • at an IT Seminar organized by Veolia, Grzegorz Rycaj talked about why data likes the cloud and showed some success stories with the cloud from our portfolio.

  • Lastly, we were at Warszawskie Dni Informatyki 2022 

    • Grzegorz Rycaj hosted a presentation “Excuse me, can I see the kitchen?”.
    • Marek Drob talked about “Have you been promoted to Team Leader or do you want to become one? Practical advice on how to succeed in your new role”.

What's more, we started our meetup called Paper Talks. We met for a few months to discuss new and interesting Machine Learning projects. At the end of the year we decided to make these meetings public. The next one will be in January, so if you want to talk or just listen to us then follow us on Linkedin to stay up to date with announcements.

Internal Knowledge Sharing

Lunch&Learn - we are continuing our meetings where our experts have the opportunity to share their knowledge with us. This is one of the most important internal initiatives at GetInData. During an online meeting, one of our specialists (or team) gives a presentation, and the rest of the group has the opportunity to ask questions and exchange experiences in this area. 

Some previous’ meetings subjects in 2022:

  • Flink DBT Adapter
  • Prove your concept - faster, better, smarter
  • How to become a good developer in scrum
  • Lookerstein Monster - why you shouldn’t be afraid of Looker
  • Image-based CTR prediction & Google Tag Manager Webscraping

Guilds are a community of people who are passionate about the same topic. Anyone from GetInData can join a guild via slack and presence is voluntary. 

We have 5 Guilds working:

  • MLOps
  • DevOps
  • Streaming (Real Time Data Processing)
  • Data Engineering
  • Advanced Analytics

At GetInData we also have Labs. The mission of Labs is to research and produce innovative solutions that develop our business and people to sustain our leadership position.

We currently have 5 work streams:

  • DataOps Labs
  • ML/MLOps Labs
  • DevOps/Developer Labs
  • Streaming Analytics Labs
  • Advanced Analytics Labs

Data Pill Newsletter

data-pill-newsletter-22

During this year we developed new formats. You could read about our podcast but there is more. In June we released the first edition of our community newsletter called DATA Pill. It is a weekly newsletter sent every Friday morning with an overview of the best Big Data, Cloud, ML and AI content. 

Until now, we have released 33 editions of DATA Pill. We run it in two forms: as a traditional newsletter and a newsletter on LinkedIn (on Adam Kawa’s profile)

Our community has almost 1500 people, 200 on the traditional mailing list and around 1300 on Linkedin.

You can read all previous DATA Pill editions and sign up here.

Plans for 2023

You can be sure we have a lot of new ideas to show you and develop existing ones in 2023. We are looking forward to other  experiences in the pipeline, opportunities and ways to share knowledge with you all. Stay up to date with us and follow our channels: Linkedin, Facebook, Twitter, and do not hesitate to subscribe to our channel on Youtube.

Want to stay up to date with our Machine Learning, Modern Data Platform and more content?

Join our newsletter and do not miss anything!

The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy
big data
analytics
conference
technology
MLOps
5 January 2023

Want more? Check our articles

complex event processing apache flink
Tutorial

My experience with Apache Flink for Complex Event Processing

My goal is to create a comprehensive review of available options when dealing with Complex Event Processing using Apache Flink. We will be building a…

Read more
ml getindataobszar roboczy 1
Use-cases/Project

Real-time Machine Learning: considerations based on Fraud Detection use case

When it comes to machine learning, most products are designed to work in batches, meaning they process data at fixed intervals rather than in real…

Read more
dbt cloudobszar roboczy 1 4
Tutorial

Introduction to dbt Cloud - features, capabilities and limitations

dbt Cloud is a service that helps data analysts and engineers put their dbt deployments into production. As data-driven organizations continue to grow…

Read more
getindata cloud migration
Tutorial

Expanding Horizons: How Google Cloud Composer Facilitates On-Prem Data Platform Migration to the Cloud

Today's fast-paced business environment requires companies to migrate their data infrastructure from on-premises centers to the cloud for better…

Read more
blogdzisobszar roboczy 1 4
Use-cases/Project

What drives your customer’s decisions? Find answers with Machine Learning Models! H&M’s Kaggle competition

Introduction We recently took part in the Kaggle H&M Personalized Fashion Recommendations competition where we were challenged to build a…

Read more
getindata nifi blog post
Tutorial

NiFi Ingestion Blog Series. PART III - No coding, just drag and drop what you need, but if it’s not there… - custom processors, scripts, external services

Apache NiFI, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy