GetInData in 2022 - achievements and challenges in Big Data world
Time flies extremely fast and we are ready to summarize our achievements in 2022. Last year we continued our previous knowledge-sharing actions and launched new ones. Let’s not waste the time, dive into this summary and see what we did at GetInData last year!
Published 44 blog posts about Big Data, ML/AI, streaming, cloud, modern data platform, events and more.
Shared a lot of content about Machine Learning, Cloud and Artificial Intelligence. On our social media we promoted events and conferences. We also regularly posted Tech Facts with news from the Big Data world. Last but not least, there was plenty of content about life at GetInData. Find it on our social media channels that you can follow here.
Continued our tradition and met more than 10 times for our internal Lunch & Learn sessions.
Guilds and Labs are constantly growing. We created 4 areas that are focused on DevOps, Data Engineering, MLOps and Streaming.
If you want to know more about our achievements, below you can find a list of some of them. Enjoy the read!
2022 was full of GetInData contributions to open-source.
During the whole of last year, we presented many solutions in the creation of which our big data experts participated, such as:
Our DevOps Labs Team - Jakub Igła, Dominik Gniewek-Węgrzyn, Mariusz Wojakowski and Piotr Mossakowski, developed the Terraform module for Atlantis, which was highly acknowledged by the company and is now officially recommended as a way of installing Atlantis on Azure. Check it out here.
Krzysztof Chmielewski put tremendous work into the release of Delta Connectors 0.6.0, which supports the Flink/Delta Connector on Apache Flink™ 1.15.3.
Mariusz Strzelecki had a hand in Apache Spark and Airflow.
For months, the GetInData team including Maciej Obuchowski, Paweł Leszczyński, Jakub Dardziński and Tomasz Nazarewicz have been developing the OpenLineage project. We helped shape how Microsoft designed and implemented contributions to support Microsoft data sources and integrate with Azure Databricks. Also, our recent contribution supporting column level lineage has been the most anticipated feature for Microsoft. You can find an article written by Microsoft about the results of their work and our contributions here.
GetInData credit in Delta Lake 2.0.0. by Grzegorz Kołakowski. This release's most exciting change from our point is Change Data Feed. Especially when we are able to implement it in Flink in Streaming.
During 2022, we were constantly posting on our blog. This means that one year later you can find here 44 published blog posts about big data, cloud, machine learning and more here. The top 5 most read are:
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz, who explained GetInData’s Data-Driven Fast Track, the 3-step framework for data transformation. In this one you can learn how to assess how data-driven your company is, how to generate ideas for new initiatives to push your company towards better decisions and how to think about implementing these initiatives to increase your chances of success.
MLOps implemented - How we combine the cloud and open-source to boost data scientists' work. with Marek Wiewiórka and Krzysztof Zarzycki, who shared their experiences of the MLOps platform they built for our customers and how they boosted the way data scientists worked. They showed how customers’ projects of ML models training can go from zero to production in a much shorter time, while achieving superior performance, high code quality, training repeatability and governance.
7 Jupyter architectures for 7 different organizations with Mariusz Strzelecki, showed the different possible Jupyter setups with their pros and cons and shared the lessons we learned, also covering topics like culling (stopping inactive notebooks) and running spark-on-kubernetes sessions directly from notebooks.
Radio DaTa Podcast
We are also happy to share with you another project we started in 2022 - a Radio Data podcast! At Radio Data we talk about data, cloud, analytics and AI/ML/BI with different guest experts and different hosts in different segment formats. We have already started two segments:
Data Journeys - episodes with special guests where our guest experts talk about how data moves around in their company, what technologies they use and why, and the value data brings to their products. These are the interviews. The host of the podcast is Adam Kawa. Some of the top listened-to episodes are:
The plan for the next year is to develop the existing formats and create new ones, so if you want to stay up to date, follow Radio DaTa on Spotify.
In 2022 we also released our eBook “MLOps: Power Up Machine Learning Process. Build Feature Stores Faster”.
What will you find there?
How to eliminate the risk of the ineffective use of data in Machine Learning
How to reach the full potential of data-driven decision-making in real-time
A step-by-step guide to building a well functioning Feature Store
What MLOps is and the MLOps platform
This eBook is divided into two parts. First from a business perspective of MLOps. Explaining the terms and dependencies necessary to making decisions in a business context like what the MLOps Platform is and whether you need it or not. The second one has a technical perspective with the advanced technical content necessary to implement the eBook knowledge.
The 8th edition of the Big Data Technology Warsaw Summit was both on-site and online. If you weren't there you can still read the review of presentations and review of top 3 presentations, which will help you to decide to join us this year on the 29-30th March 2023!
There we had the pleasure of presenting:
Bartosz Chodnicki and Linkier Seixas talked about the Benefits of a Homemade ML Platform.
Mariusz Zaręba hosted a presentation called Let your analysts build data pipelines on Modern Data Platform using SQL.
NetWorkS! project - real-time analytics that controls 50% of mobile networks in Poland - our Big Data Lead - our colleagues Maciej Bryński and Michał Maździarz from NetWorkS! described how we manage Flink jobs at scale using Ververica and Kubernetes, how we monitor the platform using Clickhouse and what problems we need to overcome in the project.
At the DataMass Gdańsk Summit, two presentations were given by our experts:
Marek Wiewiórka gave a presentation named From first contact to a full charge... How we built a Modern Data Platform in 4 months for a FinTech scale-up.
Also Adrian Dembek and Piotr Chaberski talked about From a Machine Learning competition to an enterprise analytics framework.
That's not all! Our experts had the pleasure of performing in other interesting Big Data Events, such as:
During the Airflow Summit 2022, Maciej Obuchowski and Paweł Leszczyński gave a presentation entitled OpenLineage & Airflow - data lineage has never been easier.
We were also at the Data Science Summit ML Edition 2022
Mariusz Strzelecki talked about 7 Jupyter architectures for 7 different organizations.
Adrian Dembek and Piotr Chaberski presented How NOT to win a Kaggle competition.
During the Data Science Summit 2022 our experts gave few presentations:
Michał Rudko talked about Data Platform - what does it take to be called a modern one? A new stack with well-known best practices.
Piotr Menclewicz gave his presentation Data-driven fast-track - 3 steps to make your company data-driven.
Piotr Chaberski presented Prove your concept - faster, better, smarter.
Michał Stawikowski talked about Graph Neural Networks in Modern Recommendation Systems.
at an IT Seminar organized by Veolia, Grzegorz Rycaj talked about why data likes the cloud and showed some success stories with the cloud from our portfolio.
Lastly, we were at Warszawskie Dni Informatyki 2022
Grzegorz Rycaj hosted a presentation “Excuse me, can I see the kitchen?”.
Marek Drob talked about “Have you been promoted to Team Leader or do you want to become one? Practical advice on how to succeed in your new role”.
What's more, we started our meetup called Paper Talks. We met for a few months to discuss new and interesting Machine Learning projects. At the end of the year we decided to make these meetings public. The next one will be in January, so if you want to talk or just listen to us then follow us on Linkedin to stay up to date with announcements.
Internal Knowledge Sharing
Lunch&Learn - we are continuing our meetings where our experts have the opportunity to share their knowledge with us. This is one of the most important internal initiatives at GetInData. During an online meeting, one of our specialists (or team) gives a presentation, and the rest of the group has the opportunity to ask questions and exchange experiences in this area.
Some previous’ meetings subjects in 2022:
Flink DBT Adapter
Prove your concept - faster, better, smarter
How to become a good developer in scrum
Lookerstein Monster - why you shouldn’t be afraid of Looker
Image-based CTR prediction & Google Tag Manager Webscraping
Guilds are a community of people who are passionate about the same topic. Anyone from GetInData can join a guild via slack and presence is voluntary.
We have 5 Guilds working:
Streaming (Real Time Data Processing)
At GetInData we also have Labs. The mission of Labs is to research and produce innovative solutions that develop our business and people to sustain our leadership position.
We currently have 5 work streams:
Streaming Analytics Labs
Advanced Analytics Labs
Data Pill Newsletter
During this year we developed new formats. You could read about our podcast but there is more. In June we released the first edition of our community newsletter called DATA Pill. It is a weekly newsletter sent every Friday morning with an overview of the best Big Data, Cloud, ML and AI content.
Our community has almost 1500 people, 200 on the traditional mailing list and around 1300 on Linkedin.
You can read all previous DATA Pill editions and sign up here.
Plans for 2023
You can be sure we have a lot of new ideas to show you and develop existing ones in 2023. We are looking forward to other experiences in the pipeline, opportunities and ways to share knowledge with you all. Stay up to date with us and follow our channels: Linkedin, Facebook, Twitter, and do not hesitate to subscribe to our channel on Youtube.
Want to stay up to date with our Machine Learning, Modern Data Platform and more content?
Join our newsletter and do not miss anything!
5 January 2023
Content Marketing Specialist
Like this post? Spread the word
Want more? Check our articles
How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2
Please dive in the second part of a blog series based on a project delivered for one of our clients. If you miss the first part, please check it here…