NiFi Ingestion Blog Series. PART I - Advantages and Pitfalls of Lego Driven Development
Apache NiFi, big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…
Read moreIn this episode of the RadioData Podcast, Adama Kawa talks with Arunabh Singh about Willa use cases ( FinTech): the most important ML models implemented at Willa, the ML(Ops) stack and more about Data and ML/AI at Willa. We will also focus on the trends and predictions for ML/AI for the next decades.
We encourage you to listen to the whole podcast or, if you prefer reading, skip to the key takeaways listed below.
___________
Host: Adam Kawa, GetInData | Part of Xebia CEO
Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and build custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.
Guest: Arunabh Singh, Head of Data
Arunabh Singh is the Head of Data at Eigensonne, and previously was the director of the Data Science team at Willa. His main fields of education are economics, political science and computer science. He has been working for enterprises of different scales and nature, mainly focused on data science and information technology for the last 10 years. He has been working at Willa for almost 3 years, right from the beginning of the company's journey.
________________
Willa is a mature FinTech startup company based in Sweden, focused on delivering its services in the US. Its main field of interest is freelancers and, in particular, the influencer market. The main service that Willa is currently actively developing is responsible for creating an intermediary payment service between Willa’s customers and their customers' clients.
Willa’s customers can register on Willa’s app at https://www.willa.com/. Then they can present their invoices to Willa. After accepting their invoice, the Willa app provides them with immediate access to their requested funds and takes the risk and responsibility of retrieving the money from their clients.
_________________
Willa takes two types of risks when it’s accepting its customers invoices:
The freelancer side risk (or fraudulent risk) type answers the following kinds of questions, such as:
The credit side risk (or clients risk) type answers questions such as:
Willa has developed various AI/ML models and algorithms to assess the risk involved on the fraudulent and credit risk side. Based on the data that Willa processes, the algorithms decide whether to be more conservative or more liberal in accepting the invoices of its customers. If the risk rates are too high, the model calibrates to be more conservative.
There are some cases in Machine Learning models which are not handled well. In the case of Willa, they are called asymmetric risks. To understand what an asymmetric risk is, it’s good practise to look at an example:
Let’s say there is a Willa customer which presents an invoice for 10 billion dollars for the company Apple. On paper, everything might seem fine - the customer seems to be legitimate and the client of the customer is also a very solid company. But there is a 0.0001% probability that something might go wrong. Even though the ML model would recommend accepting the invoice, Willa should not, because potential failure could result in the financial ruin of Willa. Low probability, high impact events can be catastrophic. Cases such as asymmetric risk are handled independently with some custom common sense gates in the algorithms.
In Willa there are few types of data collected such as: business reporting, operational metrics, user activity, tracking activity over time, lifetime value calculation, app interactions in the frontend, payment requests and money withdrawal etc.
The main analytics and data science operations are focused on predicting the default rates and fraud rates on each particular invoice of each particular customer. Additionally, they involve more heuristic analytics like calculating limits on particular customers based on their default rates.
Willa has been fully hosted on GCP since the beginning. It uses dbt and Airflow for upstream plumbing and orchestration, BigQuery for data warehousing and DataStudio for reporting. Most of the models are built using Python libraries like Vertex AI and Kedro.
Normally, it takes a few weeks to put an ML model into production, mainly because the product and the field Willa is dealing with is quite new and dynamic. There are also new features being constantly added to the app, which create the ever growing layer of integration that must be achieved. We want to be sure that our models are robust and sound, rather than iterate very quickly. Willa focuses more on data plumbing and data engineering and has a slower approach to data modeling.
In essence, the Big Query Console and UI together with Google Sheets is used. To create a new field in an actual model or a new variable, dbt is used. For coding of the actual production-ready models, it’s mainly Python, Kedro and Google Vertex AI which are utilized.
The three groups of skills that are most appreciated and valued at Willa are:
The most important trends or predictions regarding Data Science that Arunabh mentioned are:
We can already see examples of this, for instance Poland has tripled cloud adoption over the last 8 years and is catching up with other technologically advanced countries like Sweden and Switzerland etc.
Furthermore, in many companies there are multiple examples of where even though AI and automation is used, human confirmation and domain knowledge can be invaluable in solving a complicated problem.
Willa is going to focus mainly on doing the same thing, but better overall. The key fields of improvement for the near future are going to be:
___________________
These are just snippets from the entire conversation which you can listen to here:
Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and discover the most interesting data use cases!
Apache NiFi, big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…
Read moreTime flies extremely fast and we are ready to summarize our achievements in 2022. Last year we continued our previous knowledge-sharing actions and…
Read moreA monitoring system is a necessary component of any data platform. We can find a lot of different services that use different approaches to the same…
Read moreCustom components As we probably know, the biggest strength of Apache Nifi is the large amount of ready-to-use components. There are, of course…
Read moreA prototype is an early sample, model, or release of a product built to test a concept or process. What we have above is a nice, generic definition of…
Read moreThe 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?