Radio DaTa Podcast
8 min read

Data Journey with Arunabh Singh (Willa) – Building robust ML & Analytics capability very early with FinTech, skills & competencies for data scientists with ML/AI predictions for the next decades.

In this episode of the RadioData Podcast, Adama Kawa talks with Arunabh Singh about Willa use cases (​ FinTech): the most important ML models implemented at Willa, the ML(Ops) stack and more about Data and ML/AI at Willa. We will also focus on the trends and predictions for ML/AI for the next decades.

We encourage you to listen to the whole podcast or, if you prefer reading, skip to the key takeaways listed below.

___________

Host: Adam Kawa, GetInData | Part of Xebia CEO

Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and build custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.

Guest: Arunabh Singh, Head of Data

Arunabh Singh is the Head of Data at Eigensonne, and previously was the director of the Data Science team at Willa. His main fields of education are economics, political science and computer science. He has been working for enterprises of different scales and nature, mainly focused on data science and information technology for the last 10 years. He has been working at Willa for almost 3 years, right from the beginning of the company's journey.

________________

Willa and a Willa Use Case

Willa is a mature FinTech startup company based in Sweden, focused on delivering its services in the US. Its main field of interest is freelancers and, in particular, the influencer market. The main service that Willa is currently actively developing is responsible for creating an intermediary payment service between Willa’s customers and their customers' clients.

Willa’s customers can register on Willa’s app at https://www.willa.com/. Then they can present their invoices to Willa. After accepting their invoice, the Willa app provides them with immediate access to their requested funds and takes the risk and responsibility of retrieving the money from their clients.

_________________

Key takeaways:

1. What are the risks that Willa has to manage and how does it handle them? 

Willa takes two types of risks when it’s accepting its customers invoices:

  1. Freelancer side risk
  2. Credit side risk

The freelancer side risk (or fraudulent risk) type answers the following kinds of questions, such as:

  • Is this freelancer legitimate?
  • Is this invoice legitimate?

The credit side risk (or clients risk) type answers questions such as:

  • What is the financial situation of the customer's client?
  • Does the client intend to pay Willa?
  • What is the economical environment of the client? Can a recession influence its potential to pay?

Willa has developed various AI/ML models and algorithms to assess the risk involved on the fraudulent and credit risk side. Based on the data that Willa processes, the algorithms decide whether to be more conservative or more liberal in accepting the invoices of its customers. If the risk rates are too high, the model calibrates to be more conservative.

2. ML: Are all the cases handled by algorithms? What is asymmetric risk?

There are some cases in Machine Learning models which are not handled well. In the case of Willa, they are called asymmetric risks. To understand what an asymmetric risk is, it’s good practise to look at an example:

Let’s say there is a Willa customer which presents an invoice for 10 billion dollars for the company Apple. On paper, everything might seem fine - the customer seems to be  legitimate and the client of the customer is also a very solid company. But there is a 0.0001% probability that something might go wrong. Even though the ML model would recommend accepting the invoice, Willa should not, because potential failure  could result in the financial ruin of Willa. Low probability, high impact events can be catastrophic. Cases such as asymmetric risk are handled independently with some custom common sense gates in the algorithms.

3. What types of data is processed at Willa and what are the data analytics and data science operations?

In Willa there are few types of data collected such as: business reporting, operational metrics, user activity, tracking activity over time, lifetime value calculation, app interactions in the frontend, payment requests and money withdrawal etc.

The main analytics and data science operations are focused on predicting the default rates and fraud rates on each particular invoice of each particular customer. Additionally,  they involve more heuristic analytics like calculating limits on particular customers based on their default rates.

4. Technology stack at Willa

Willa has been fully hosted on GCP since the beginning. It uses dbt and Airflow for upstream plumbing and orchestration, BigQuery for data warehousing and DataStudio for reporting. Most of the models are built using Python libraries like Vertex AI and Kedro.

5. How long does it take to create and deploy a new machine learning model in production?

Normally, it takes a few weeks to put an ML model into production, mainly because the product and the field Willa is dealing with is quite new and dynamic. There are also new features  being constantly added to the app, which create the ever growing layer of integration that must be achieved. We want to be sure that our models are robust and sound, rather than iterate very quickly. Willa focuses more on data plumbing and data engineering and has a slower approach to data modeling.

6. What are the free of charge technologies that the Willa team uses on a daily basis?

In essence, the Big Query Console and UI together with Google Sheets is used. To create a new field in an actual model or a new variable, dbt is used. For coding of the actual production-ready models, it’s mainly Python, Kedro and Google Vertex AI which are utilized.

7. What are the most sought after skills and competences at Willa?

The three groups of skills that are most appreciated and valued at Willa are:

  • A general quantitative aptitude -  you have to be comfortable with numbers and have the ability to break down the problem into quantitative problems at best, and at least into analytical problems.
  • An ability to think counter-intuitively, curiosity to dig deeper and not just be satisfied with the first result.
  • An ability to structurize the unstructured means of a decision and enhance it with data analysis and data science, by speaking well, writing well, communicating well and presenting well.

8. What are the most important trends and predictions regarding Data Science, AI and in the industry overall in the upcoming decade?

The most important trends or predictions regarding Data Science that Arunabh mentioned are:

  • The idea that there will be mass unemployment caused by machines taking over human jobs seems unlikely, partly because we already have experience in working alongside automatization and machines and already have experience in using machines (even very automatized) to our advantage, and also because not every aspect of human activity can be automated simply. 
  • Self serving analytics and AI/BI are not adopted as easily as was previously predicted. People can create good solutions regarding AI, but they don’t fully rely on them and seek out human confirmation.
  • There will be more companies in slightly less technologically developed countries that will start to adopt and use AI and ML models.
  • Adding and focusing on „Green Tech” is going to be the next big industry movement of the next 25 years.

We can already see examples of this, for instance Poland has tripled cloud adoption over the last 8 years and is catching up with other technologically advanced countries like Sweden and Switzerland etc.

Furthermore, in many companies there are multiple examples of where even though AI and automation is used, human confirmation and domain knowledge can be invaluable in solving a complicated problem.

9. What is going to happen at Willa in the near future?

Willa is going to focus mainly on doing the same thing, but better overall. The key fields of improvement for the near future are going to be:

  • Fine tuning the data science model:
    • better predictions of credit and invoice risks,
    • including more user data in predictions,
    • enhancing the features of the Willa product.
  • Doing longer-term predictive product analytics on the user side for the following questions, for example:
    • What kind of users are likely to stay with Willa after 2 years? 
    • Who joins, who stays, who reactivates? 
  • Revamping the data warehouse so that it can scale better for a larger number of users and a larger amount of data.

___________________

These are just snippets from the entire conversation which you can listen to here: 

Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and discover the most interesting data use cases! 

SUBSCRIBE

analytics
ML
AI
27 July 2023

Want more? Check our articles

1 gh9BkF JQSj9vlgSi0I48A
Tech News

Everything you would like to know about Kubernetes

Source: GetInData, Google. Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of…

Read more
propozycja2
Tutorial

Deploying efficient Kedro pipelines on GCP Composer / Airflow with node grouping & MLflow

Airflow is a commonly used orchestrator that helps you schedule, run and monitor all kinds of workflows. Thanks to Python, it offers lots of freedom…

Read more
blogsrodaobszar roboczy 1 4
Tutorial

Modern Data Platform - the what's, why's and how's? Demystifying the buzzword

Nowadays, data is seen as a crucial resource used to make business more efficient and competitive. It is impossible to imagine a modern company…

Read more
lean big data 1
Tutorial

Lean Big Data - How to avoid wasting money with Big Data technologies and get some ROI

During my 6-year Hadoop adventure, I had an opportunity to work with Big Data technologies at several companies ranging from fast-growing startups (e…

Read more
getindator create a cover graphic for a blog post about optimiz 05dfdc1c 8a91 4d99 9b19 137eabe195b0
Tutorial

Optimizing Flink SQL: Joins, State Management and Efficient Checkpointing

In the fast-paced world of data processing, efficiency and reliability are paramount. Apache Flink SQL offers powerful tools for handling batch and…

Read more
1712737211456
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2024! Part 1: Takeaways from Spotify, Dropbox, Ververica, Hellofresh and Agile Lab

It was epic, the 10th edition of the Big Data Tech Warsaw Summit - one of the most tech oriented data conferences in this field. Attending the Big…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy