GetInData contributes to Apache Flink
We’re proud to say GetInData not only use Apache Flink in its consulting projects, but it also contributes back to it. 👍👍👍
Our first contributions
A few weeks ago, Dawid Wysakowicz, our colleague at GetInData, has been officially added to the list of committers to Apache Flink. This is an amazing success for Dawid and GetInData – we are a small highly-specialised company yet, without any investors, but we feel proud of donating a large chunk of off-project time to contribute back to one of the most popular and innovative open-source Big Data projects such as Flink. You can read about our recent contributions to Flink here.
What’s more, in June, GetInData signed the Corporate Contributor License Agreement with Apache Software Foundation (ASF). This means that the source code implemented by our team members in their daily engineering work can be submitted to Apache Flink.
Real-time analytics with Flink
Even though you hear about technologies such as Hadoop, Hive, Spark, and Storm probably more frequently, Flink seems to have a very promising future.
With Flink, you can process large volumes of data in real-time. It’s already battle-tested and used by a number of well-known companies such as Alibaba, Uber, Netflix, King, and Zalando. The most common use-cases include:
- real-time analytics e.g. creating car driver incentives at Uber to immediately bootstrap a marketplace – more,
- real-time personalization and recommendation e.g. relevant and accurate search results for each user in real-time at AliExpress – more,
- anomaly detection and matching events against a pattern in real-time e.g. a use-case that GetInData will implement for its customer to react to several customer-specific behaviour patterns
- ETL jobs that still many companies implement using traditional batch technologies that run computation each day or hour (that is too slow in most cases)
Why and when streaming is better
We have described the main advantages of stream processing with Flink in our two blog posts (part 1 and part 2). These advantages are not only sub-second latency at scale but also simplicity (fewer moving parts), correctness (ability to handle late and out-of-order events) and more. As an illustrative example, we used Spotify music application and events that represent the activity of its happy users. This example is very close to our because we had been working with Big Data at Spotify with love ❤ ❤ ❤ .
A reward for hard work
Of course, work on exciting open-source Big Data project can be considered as pure pleasure. It’s not walk in the park, however, to be added to the short list of its official committers. Apart from having a great technical knowledge and high engineering skills, you must allocate a lot of time for developing and promoting the project. This includes not only implementing source-code (e.g. bug fixes, new features) but also solving issues submitted by users, updating documentation, answering questions asked on the mailing lists and popularising the project by giving conference talks and writing blog posts. Shortly speaking, helping others to be successful with the project and technology.
For the last 6 months, Dawid hasn’t been engaged in any commercial project at GetInData, but he has spent every single day contributing to Flink. The results of Dawid’s work include many requested features for Flink CEP – a Flink library for matching continuously incoming events against a pattern. This library has a big potential for real-time analytics – for instance, Flink CEP can be used by insurance companies to analyze and improve the process of buying home insurance online.
We have also got enormous help from dataArtisans (a company of original creators of Flink). Our colleagues at dataArtisans have helped us to identify the first set of patches to work on, provided guidance and mentorship and gave frequent feedback. Thanks guys, you rock!
Better support for real-time analytics
GetInData was founded by former Spotify Big Data engineers and administrators in 2014. The company architects and builds dedicated Big Data solutions based on open-source technologies. The list of its customers includes fast-growing startups (e.g. Truecaller, GoEuro, Synerise) and global corporations from pharmacy, telco, FMCG and media sectors.
With a team of European-class experts and active Flink committers, GetInData offers unique support for real-time analytics to its customers.