Power of Big Data: Sales
In the first part of the series "Power of Big Data", I wrote about how Big Data can influence the development of marketing activities and how it can…
Read moreIn this episode of the RadioData Podcast, Adama Kawa talks with Alessandro Romano about FREE NOW use cases: data, techniques, signals and the KPIs used to develop the dynamic pricing ML model for a real-time mobile app. We will also talk about the feedback loop and technology stack.
We encourage you to listen to the whole podcast or, if you prefer reading, skip to the key takeaways listed below.
___________
Host: Adam Kawa, GetInData | Part of Xebia CEO
Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and builds custom Big Data solutions. Adam is also the creator of many community initiatives such as the RadioData podcast, Big Data meetups and the DATA Pill newsletter.
Guest: Alessandro Romano, Senior Data Scientist
Alessandro Romano is a Senior Data Scientist at Kuehne+Nagel, who previously worked for FREE NOW. Alessandro started working as a Data Scientist 6 years ago. He studied Computer Science and Business Informatics – a mix of statistics, computer science and economics which could be described today as a Data Science profile.
________________
FREE NOW is a multi-mobility company that creates a service that enables the user to request different types of transportation such as a Taxi, car-sharing, electric scooters or a private taxi – depending on the region, and provides this all in a single application.
FREE NOW processes multiple data sources and data types, therefore being the top mobility service on the market. One of the first problems that Allesandro had to solve was the dynamic pricing of the drivers. The problem had to take into account the supply and demand of the drivers and provide the right price for the actual situation.
_________________
The problem was solved by processing the signals that the app gets from the environment in real time. The basic solution is about preserving the balance between the supply and demand of the drivers and less about engaging as many passengers as possible or as many drivers as possible. The solution is not mainly about increasing the revenue, but about balancing the supply and demand when the difference between the two is high (e.g. high demand and low supply, or high supply and low demand).
The typical data that is collected for supply and demand is:
In this example, the demand is very high in comparison to the supply, so we raise the price so that only those passengers that really need a driver can afford it. In the other case, when the supply is high but the demand is low, we lower the price.
It seems simple, but it’s quite complicated underneath, because we enrich the basic information about the supply and demand with e.g. weather data (what can impact the predictions).
If it is raining in London, then whoever is leaving the office is going to request a taxi, because no one wants to get wet. When there is heavy rain we can be sure that everyone is going to book a taxi no matter the price, so the demand is high and this can be predicted by using additional weather data, for example.
There are a bunch of KPIs (Key Performance Indicators) that drive the process of selecting the right model. We also use accuracy, but it’s not as important as KPIs and testing the model online. There are a lot of experiments run where the models are tested against real-time data and there is also a lot of A/B testing involved. We try to see how the model interacts with the environment and whether it meets the expectations.
There is a feedback loop between the model and the environment: the model reads the environment and sets up the price, and this event changes the environment (the demand) which is a new environment for the model. This is a very complex problem to solve. The model has to react quickly to certain events and has to be stable in a constantly changing environment.
The feedback loop is quite fast, which enables FREE NOW to experiment with different pricing strategies and algorithms within minutes or hours. Additionally, when talking about predictions, we can achieve immediate feedback and a comparison between e.g. the expected time of arrival and the real time of arrival, which makes FREE NOW unique in comparison to, for example, Spotify or other companies that deal with a large amount of data.
This depends on the business, but in the case of FREE NOW it’s important to check how many of the quotes (pricing requests for a ride) are converted into bookings, how many of those bookings we send to the drivers and how many of them are accepted by the drivers.
It might be that the KPI interpretation changes during the use of the model, or the KPIs change from quarter to quarter.
The most common tools that we use are:
Whenever we want to execute the Databricks Notebook which contains a training pipeline, we use Databricks Operator which calls Databricks Notebook from Airflow and we build our training pipeline from there.
The cloud stack that is used in the background is AWS, although we don’t interact with it directly for most of the time.
We use MLFlow, mainly because it’s available out of the box in Databricks. You can track your experiments alongside your model and all the information that is the output of your notebook in one place, which is very helpful.
We use those tools but we don’t maintain them, we use Kafka Streams for stream processing the data on the Kafka cluster.
Having a multi-mobility app is a lot of fun. You can use it everyday for many car-sharing services.
When talking about uniqueness from the data science and data engineering perspective, then I can say that we have a great team with a lot of smart people, who contribute everyday to the whole project. This can be clearly visible from the inside. Regarding the outside, probably the CEO would be a better person to answer this question.
Regarding the technologies and trends, it seems that we haven't discovered a proper way of using neural networks and AI overall. There are a lot of stories about people trying to solve problems by following the trends and failing, because they thought that AI and NN would solve everything.
Right now there is more understanding about the fact that it does not have to be a neural network. There are lots of other technologies that come from statistics, computer science, etc. that can be successfully applied to a wide range of problems which don’t need a fancy solution like a neural network. Sometimes, even a simple algorithm like regression or an implementation of a function solves the problem.
Over recent years we’ve lost track of what the correct way of solving problems is. The new technologies that we started using do not solve everything. The technologies should be applied to the right classes of problems. Overengineered solutions are always hard to maintain.
___________________
These are just snippets from the entire conversation which you can listen to here:
Subscribe to the Radio DaTa podcast to stay up-to-date with the latest technology trends and discover the most interesting data use cases!
In the first part of the series "Power of Big Data", I wrote about how Big Data can influence the development of marketing activities and how it can…
Read moreFlink complex event processing (CEP).... ....provides an amazing API for matching patterns within streams. It was introduced in 2016 with an…
Read moreMLOps on Snowflake Data Cloud MLOps is an ever-evolving field, and with the selection of managed and cloud-native machine learning services expanding…
Read moreIn the dynamic world of e-commerce, providing exceptional customer service is no longer an option – it's a necessity. The rise of online shopping has…
Read moreIn this blogpost series, we share takeaways from selected topics presented during the Big Data Tech Warsaw Summit ‘24. In the first part, which you…
Read moreMoney transfers from one account to another within one second, wherever you are? Volt.io is building the world’s first global real-time payment…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?