Running Observability Stack on Grafana
Introduction At GetInData, we understand the value of full observability across our application stacks. For our Customers, we always recommend…
Read moreHave you ever searched for something that isn't typical for you? Maybe you were looking for a gift for your grandmother on Amazon or wanted to listen to kids' music on Spotify. Even though it may not fit your usual taste or personality, you still expect a personalized approach from these platforms to help you find what, let's say, your grandmother or children might like.
What this means is that Spotify or Amazon should be able to detect in real-time, that your current behavior is different from your historical profile. They should be able to discover your needs based on your current activity and take into account the real-time context of your search. In essence, they should be able to identify the persona that you currently have and use that information to provide the best recommendations. On the other hand, the next time you search for something, but only for yourself, these platforms should mainly take into account your historical profile (e.g. your music taste) to provide the best personalized recommendations.
Taking real-time context into account when users browse your website, shop on your ecommerce site, listen to your music or watch your videos can increase their engagement and your revenue. However, it's also important to understand the consequences of not considering real-time context. For instance, let's look at what happened to some Spotify users during Christmas 2012. This is a real-world story and was even presented by my ex-colleague from Spotify at one of the tech events (link to slides).
Here is the story. Some adults who used Spotify listened to a lot of romantic music, especially during evenings and weekends. The ad personalization algorithm classified them as users who could hear ads relevant to adults. Then, during Christmas, these same users listened to Christmas music during their dinners with their family or their siblings' kids amongst others. Suddenly, they started hearing condom ads, which ruined the Christmas mood for them. The ad placement algorithm only took into account their historical profile and their recent music preferences, ignoring the real-time context of Christmas time and Christmast music.
What could Spotify do differently in 2012?
By considering real-time context, you can generate a blacklist of ads that shouldn't be placed for a given user at a specific time, as well as a whitelist of ads that are appropriate or even opportunistic. In this Christmas example, you might show products or events for small children or family events in a nearby location during the holiday season.
Of course, Spotify solved this problem and implemented this use-case using streaming pipelines that process data in real-time.
Taking into account a real-time context also helps you sell more. The following example could be from any large e-commerce platform, like Amazon, Alibaba, Allegro, or Zalando, that sells numerous products online. Most often I buy products for myself there, but sometimes I search for a gift for my wife or mother. In such cases if the search engine or product recommendation algorithm shows me products based on my historical profile, the recommendations won't be good because my preferences are different from my wife's or mother's. This is important because finding gifts is not easy for people like me.
In this case, we'd expect the e-commerce platform to quickly realize that we’re searching for a product for a 33-year-old woman like someone's wife or a 65-year-old woman like someone’s mother and suggest products that they might like. This would save time and increase the chances of buying a product from their platform. Otherwise, we would rather speak with ChatGPT so that it recommends successful gifts for our loved ones, if the ecommerce platform wasn't able to do so in a convenient way.
To be helpful in such situations, e-commerce platforms need to discover the real-time needs and suggest products that can fulfill those needs, even if they're far from the customer's historical profile and typical purchases.
The next example is where understanding real-time context is necessary are emergency situations that are atypical in nature, and they differ from your normal historical profile. The example comes from the banking sector. Imagine that you are on vacation with your family in Italy, and you arrive at the car-sharing company to get your car. It's late in the evening, and your wife and two kids are very tired. You try to pay for the car, but your payment is not accepted because your limits don't allow it. Unfortunately, you can't easily increase the limit using your mobile app.
In this situation, it would be great if someone from your bank noticed the problem. They could see that you are abroad, it's late in the evening, the payment for a particular amount of EUR was not accepted, and based on your real-time context and historical profile, suggest an appropriate solution. This could be increasing the payment limit on your card or even allowing you to take a loan if your historical profile shows that you can repay it easily and without risk to the bank. This is one of the interesting use-cases that I talked about recently with my colleague who works with data & analytics in a large bank.
Such emergency situations can happen in many other aspects of life, for example, exceeding the amount of gigabytes in your mobile internet plan when working remotely abroad. What do you expect from your mobile operator in such a situation? Kcell, one of the largest Kazakh telecoms, and our company that specializes in real-time streaming, gave an interesting conference presentation on how real-time streaming can be implemented and used in telecom to assist users, provide better services at any time (taking into account real-time context), detect fraud, and support internal processes.
Having a grasp of real-time context is also critical when making intelligent decisions under strict deadlines. For instance, consider a marketing expert who is running a marketing campaign during Black Friday Weekend or Christmas, with a limited budget to optimize efficiently in a short period of time. With real-time feedback and context on the current performance of the campaign, it becomes possible to make more informed decisions on how to allocate the remaining budget for the best possible return on investment (ROI). This information can be obtained through a marketing tool or a marketplace where products or services are being advertised, such as a podcasting app or a social media platform, which should be able to provide real-time metrics.
While companies like Uber, Alibaba and Spotify use real-time streaming at a large scale, it still makes sense to use it on a smaller scale as well. Especially when selling something rather expensive and where every customer counts.
Let's now take the example of buying a car. Typically, when you buy a car, there's a significant change in your life, like expecting your first child, getting a higher-paying job or unfortunately experiencing a car accident. However, navigating car manufacturers' websites can be challenging with numerous models, types, options and complex pricing.
To address this, a European car manufacturer, together with the help from GetInData, implemented a real-time scoring system that analyzes a user's online car search behavior and classifies them into categories. These categories include whether they're searching for a large family car or a small one, just exploring or wanting to buy urgently, interested in standard or premium equipment.
These insights calculated in real-time give the manufacturer's website the possibility to quickly navigate the customer and suggest cars that most likely fit their needs based on size, price and availability. Without this, they wouldn't be able to assist the customer during their first and sometimes only visit to the manufacturer's website.
Thanks to such a real-time system, there are higher chances that the customer will book a test drive, configure the car to check the final price, or ask questions via chat or contact form, increasing the likelihood of a sale.
I have given examples of why real-time streaming is better than traditional batch processing. When data is processed only every hour or day by batch pipelines, you may miss out on important opportunities or even worse - provide bad service based on historical data. Real-time streaming can improve services and products by processing data as it arrives in real-time. You can watch our presentation on the comparison between real-time streaming and traditional batch processing at Big Data Spain (video) if you are interested.
Now, let's take a closer look at some of the technologies that can be utilized to implement these use cases.
Apache Flink is currently one of the most mature and widely used stream processing engines available. It's an open-source technology that can be used freely on any cloud or on-premise platform. Flink is highly reliable and can handle large-scale data processing. It has been used by companies from all over the world such as Netflix, Uber, Comcast, eBay, Lyft, Alibaba, Zalando and ING, to name a few.
At GetInData, we have utilized Flink in approximately 10 production projects, including those in the telecommunications and banking sectors. Our recent project involved using Flink and Kafka to develop a real-time marketing automation system for a large European bank to offer more personalised products in real-time. This stream processing platform use is now also being extended to cases of fraud detection, business automation, online ML-based products, real-time customer-facing notifications and many more. On the other hand, with Networks! We have implemented a real-time analytics project that controls 50% of the mobile network in Poland. The ability to analyze data in real time for mobile networks is crucial for diagnostics and ensuring the quality of the service for end customers. To achieve this, we built a real-time ingestion and analytics platform that processes 2.2 billion messages a day from mobile networks hardware. This solution includes the calculation of more than 5000 KPIs and 1500 aggregation defined in SQL, on 750 Kafka topics. You can watch more on our YouTube presentation (video).
At GetInData we also released an open-sourced dbt-flink-adapter, that allows running pipelines defined in SQL in a dbt project on Flink. You can find its description and a short tutorial here.
Many cloud vendors also provide their own cloud-native services for real-time streaming. For example, Google Cloud Platform (GCP) provides the Dataflow service that is very similar to Flink but runs natively on GCP and is integrated well with other GCP services such as PubSub or BigQuery. Soon, you will also be able to use the Confluent platform with a native integration with Flink as Confluent acquired a German company called Immerok that was founded by a large group of committers and contributes to Apache Flink. You can also check Aiven.io that provides you with fully-managed services to run Apache Flink, Apache Kafka and even Clickhouse that run the cloud. Btw, we often use three of such open-source technologies in our projects at GetInData. We encourage you to watch the video “Real time analytics that controls 50% of mobile networks in Poland” where we describe how we implemented a real-time analytics project and where we calculate 5000 KPIs and 1500 aggregation defined in Flink SQL, on 750 Kafka topics.
Of course, you can’t mention Flink without mentioning Ververica, the company founded by the original creators of Apache Flink. Ververica offers its own Enterprise Stream Processing & Analytics. At GetInData we have used this platform as well in our production projects and it attracts various companies thanks to its enterprise features and nice UI.
There are third-party products that make the development of real-time streaming applications easier. One of these products is Decodable, which democratizes streaming by making it accessible to anyone who knows SQL. This allows not only data engineers but also analytics engineers and data analysts to implement streaming cases using pay-as-you-go and fully cloud-managed services, and easily integrate with popular data sources and sinks (e.g. OLTP databases, caches, streaming systems, search systems).
Last but not least, GetInData provides its own Streaming Analytics Platform that is complementary to and built on top of the solutions described above. Our platform is designed to simplify the development of real-time streaming applications by providing a convenient notebook-like experience to analytics engineers. One of the unique features of our platform is the Streaming Analytics Workbench, where you can experiment and write your streaming application in SQL or Python using a notebook. The platform promotes the best DataOps principles and, for example, allows you to run your application locally for testing purposes before deploying it to the production environment, whether it be on-premise or in the cloud. Our platform is complementary to the above mentioned solutions (e.g. Decodable, Confluent, Aiven) because it’s main focus is prototyping & building real-time streaming apps e.g. marketing automation, business automation, fraud detection, ML-based apps and it can be integrated with and/or conneted to products implemented by vendors such as Ververica, Decodable, Confluent or Aiven.
If you enjoyed this blog post episode, please consider sharing it with your colleagues. Your feedback is valuable to us and lets us know that we should continue creating more content like this. Thank you!If you are interested in having a 32-minute free knowledge sharing session with us to talk about real-time streaming analytics, feel free to book it here.
Introduction At GetInData, we understand the value of full observability across our application stacks. For our Customers, we always recommend…
Read moreWhat is BigQuery ML? BQML empowers data analysts to create and execute ML models through existing SQL tools & skills. Thanks to that, data analysts…
Read moreThe end of 2020 has come, and it's time to stop for a moment and look back. The past year was not the easiest one and presented us with many…
Read moreIn today's digital age, data reigns supreme as the lifeblood of organizations across industries. From enabling informed decision-making to driving…
Read moreSales forecasting is a critical aspect of any business, especially in the fast-paced and competitive world of e-commerce. Accurately predicting future…
Read moreBeing a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?