GetInData contributes to Apache Flink

We’re proud to say GetInData not only use Apache Flink in its consulting projects, but it also contributes back to it. 👍👍👍

Our first contributions

A few weeks ago, Dawid Wysakowicz, our colleague at GetInData, has been officially added to the list of committers to Apache Flink. This is an amazing success for Dawid and GetInData – we are yet small highly-specialised company, without any investors, but we feel proud of donating a large chunk of off-project time to contribute back to one of the most popular and innovative open-source Big Data projects such as Flink. You can read about our recent contributions to Flink here.

Dawid Wysakowicz Flink Committer GetInData

What’s more, in June, GetInData signed the Corporate Contributor License Agreement with Apache Software Foundation (ASF). This means that the source code implemented by our team members in their daily engineering work can be submitted to Apache Flink.

Real-time analytics with Flink

Even though you hear about technologies such as Hadoop, Hive, Spark and Storm probably more frequently, Flink seems to have very promising future.

With Flink you can process large volumes of data in real-time. It’s already battle-tested and used by a number of well-known companies such as Alibaba, Uber, Netflix, King and Zalando. The most common use-cases include:

    • real-time analytics e.g. creating car driver incentives at Uber to immediately bootstrap a marketplace – more,
    • real-time personalization and recommendation e.g. relevant and accurate search results for each user in real-time at AliExpress – more,
    • anomaly detection and matching events against a pattern in real-time e.g. a use-case that GetInData will implement for its customer to react to several customer-specific behaviour patterns
    • ETL jobs that still many companies implement using traditional batch technologies that run computation each day or hour (that is too slow in most cases)
Source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external

Source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external

Why and when streaming is better

We have described main advantages of stream processing with Flink in our two blog posts (part 1 and part 2). These advantages are not only sub-second latency at scale, but also simplicity (fewer moving parts), correctness (ability to handle late and out-of-order events) and more. As an illustrative example, we used Spotify music application and events that represent the activity of its happy users. This example is very close to our because we had been working with Big Data at Spotify with love ❤ ❤ ❤ .

Adam Kawa (one of the founders at GetInData) speaking about “Hadoop Adventures at Spotify” at Strata Hadoop World 2013 in New York City.

Adam Kawa (one of the founders at GetInData) speaking about “Hadoop Adventures at Spotify” at Strata Hadoop World 2013 in New York City. Slides: https://www.slideshare.net/AdamKawa/hadoop-adventures-at-spotify-strata-conference-hadoop-world-2013

Reward for hard work

Of course, work on exciting open-source Big Data project can considered as pure pleasure. It’s not walk in the park, however, to added to the short list of its official committers. Apart from having great technical knowledge and high engineering skills, you must allocate a lot of time for developing and promoting the project. This includes not only implementing source-code (e.g. bug fixes, new features), but also solving issues submitted by users, updating documentation, answering questions asked on the mailing lists and popularising the project by giving conference talks and writing blog posts. Shortly speaking, helping others to be successful with the project and technology..

For last 6 months, Dawid hasn’t been engaged in any commercial project at GetInData, but he has spent each single day contributing to Flink. The results of Dawid’s work include many requested features for Flink CEP – a Flink library for matching continuously incoming events against a pattern. This library has a big potential for real-time analytics – for instance, Flink CEP can be used by insurance companies to analyze and improve the process of buying home insurance online.

We have also got enormous help from dataArtisans (a company of original creators of Flink). Our colleagues at dataArtisans have helped us to identify first set of patches to work on, provided guidance and mentorship and gave frequent feedback. Thanks Guys, you rock!

Dawid Wysakowicz shares his technical knowledge during the real-time stream processing workshop with Apache Flink.

Dawid Wysakowicz shares his technical knowledge during the real-time stream processing workshop with Apache Flink.

Better support for real-time analytics

GetInData was founded by former Spotify Big Data engineers and administrators in 2014. The company architects and builds dedicated Big Data solutions based on open-source technologies. The list of its customers include fast-growing startups (e.g. Truecaller, GoEuro, Synerise) and global corporations from pharmacy, telco, FMCG and media sectors.

With a team of European-class experts and active Flink committers, GetInData offers unique support for real-time analytics to its customers.

SHARE THIS!
Post by Adam Kawa

Adam became a fan of Big Data after implementing his first Hadoop job in 2010. Since then he has been working with Hadoop at Spotify (where he had proudly operated one of the largest and fastest-growing Hadoop clusters in Europe for two years), Truecaller, Authorized Cloudera Training Partner and finally now at GetInData. He works with technologies like Hadoop, Hive, Spark, Flink, Kafka, HBase and more. He has helped a number of companies ranging from fast-growing startups to global corporations. Adam regularly blogs about Big Data and he also is a frequent speaker at major Big Data conferences and meetups. He is the co-founder of Stockholm HUG and the co-organizer of Warsaw HUG.

Leave a Reply

Your email address will not be published. Required fields are marked *

Blue Captcha Image
Refresh

*