8 min read

Data Journey with Michał Wróbel (RenoFi) - Doing more with less with a Modern Data Platform and ML at home


In this episode of the RadioData Podcast, Adam Kawa talks with Michał Wróbel about business use cases at RenoFi (​​a U.S.-based FinTech), the Modern Data Platform on top of the Google Cloud Platform and advanced ML/AI models. We will also get an insight into what the specifics of startup projects are. 


Host: Adam Kawa, GetInData | Part of Xebia CEO

Since 2010, Adam has been working with Big Data at Spotify (where he proudly operated one of the largest and fastest-growing Hadoop clusters in Europe), Truecaller and as a Cloudera Training Partner. Nine years ago, he co-founded GetInData | Part of Xebia – a company that helps its customers to become data-driven and builds custom Big Data solutions. Adam is also the creator of many community initiatives like the RadioData podcast, Big Data meetups and the DATA Pill newsletter.

Guest: Michał Wróbel, Lead Software Engineer

Michał has worked as a data engineer since 2015. Michał has a lot of expertise in AWS, analytics engineering, AI/ML and end-to-end projects. At RenoFi, Michał was responsible for delivering several data products and for the management of RenoFi’s data platform. Now Michał works at Embedded Insurance as Lead Software Engineer.


RenoFi Use Case

RenoFi is a platform that helps people who are trying to renovate their properties to borrow more money with the lowest possible monthly repayment. What makes RenoFi different? Usually, when institutions like banks grant loans, they use the present value of the property, without taking into account the value after renovation. RenoFi tries to calculate the post-renovation value of the property, which significantly affects the terms of the loan.


Key quotes

  1. dbt 

I think dbt triggered the biggest change in my data career, because it simplified so many things for so many people. It was revolutionary. I remember the times when we had custom SQL scripts and when we ran them on schedule on Airflow, you had to execute them in order. You either had one big file, thousands of lines - just sequential SQL. Now, with dbt you have all the dependencies, all the docs just there in one place.”

“Back in the day, you had to write your own custom scripts, keep the state. Hightouch does this for you, you just connect, say what the data source is, what your destination is and hightouch will handle what has changed on the source and how it changed on the destination. If it didn't change on destination then I wouldn't even hit and update it. So, it has nice notifications, is really easy to integrate with dbt and any data plot.”

  1. Data

"There is this common problem with training prediction skew, where you might train your model on the data that you have available in the warehouse.

But when you try to predict, the client of this model might send one feature that is in a different shape / format than model expects.

This could be an upper case, lower case, not normalised, etc. It could be also different because in the DBT you have transformed it from the raw shape and data scientist wasn’t aware of..

So this is one thing that you need to remember. A common solution for this is a feature store, which adds a massive complexity to the system, because you need to have someone to develop the solution and maintain it and you need to change your experience."

A Feature Store could be a solution for several machine learning problems. In this ebook you will find out what these problems are and how a well-designed feature store can solve them. Along with a step-by-step tutorial:

ebook banner

DOWNLOAD FREE EBOOK

  1. Startups & business

"In startups, you don't usually start with data teams. You need to have a product and then develop it, try to improve it, find a market and then when a startup is successful, maybe set up a data team. At RenoFi this was a little bit different because our CTO has had a data background from day one and made really good decisions. At RenoFi, the data team is pretty small, but we started quite early from the beginning."

"In startups, things change quicker than in other companies, so change is the main thing in startups." 

"Startups can do more with less, and by less, I mean fewer people. And yeah, sometimes you don't need to go for a brand new shiny solution like ML, you can just use heuristics which should be good enough and provide adequate business results to a company, because the cost of a proper ML solution is really high."

"There's a plethora of cloud services and external cloud services that we also use. So almost every startup, I would say, has internal databases and external cloud services, from the business perspective and from the CEO perspective, the management team - would like to have all that data in one place. So obviously you have different names these days, whether it's a warehouse, whether it's a data lake or data lakehouse. Whatever, it's a big database which contains all of your data."

"So having all the solutions in small teams means incurring huge costs.  And don't even get us started on maintenance costs!  So you can set it up and it's all fine. It's easier. But then you need to update it. You need to make sure it's worth it. You need to set up monitoring. I wouldn't say this is feasible for a small team, furthermore if you want to have it running on a good quality level, then you would have to employ someone that doesn't have a life outside of work."

"Thankfully, building real-time solutions or powerful ML / AI models has become simpler and cheaper, thanks to new technologies and new tools. So it's likely that in a few years it will be no problem to use them by default, because the additional costs and efforts to build them will be relatively small."

"A good recommendation is that you don't build from scratch if you don't have the expertise in-house, but hire someone at least for a few months, to set it up to give you the best practices that are currently on the market. Maybe you would still have someone available from here, and then when required in-house, so what you should focus on are the analytics engineers. And by analytics engineers, I mean people that work with dbt and know enough in order to be able to code, who can work effectively within common line standard programming practices and tests. Therefore,  RenoFi is just great."

"So it's about having a very pragmatic approach and focusing first on the most critical functionalities, because they usually bring the most value. However, there are companies that must develop real-time online machine learning solutions from day one in order to just exist, because their core business model requires them to do so. So one example is Free Now, a multi-mobility company, or Uber or Bolt or a similar app. They need to calculate the price of a ride dynamically in real-time, based on actual supply and demand, and this changes all the time. They also need to predict the estimated time of driver arrival. The same goes for the estimated duration of your ride, so that you know when you will reach your destination and so on. If the apps do this well, they will obtain drivers, customers and will earn money. But if they do this badly, they will simply lose money. So in their case, real-time and machine learning is a must-have solution which needs to be invested in and improved on constantly, especially at scale."

"Usually, if you have vendors such as dbt Labs or the Google Cloud Platform, then if they are successful, they have a very big leverage for their solutions because they can have like hundreds or even thousands of user companies. So it's cost efficient and makes sense to them to invest in their solutions, thanks to this economy of scale. So they can keep improving them by adding new functionalities, especially the ones that you won't be able to implement by yourself, as sometimes it would be simply too expensive to develop some kind of custom-made or big feature only for yourself, because you won't have this economy of scale and the same leverage as they have."


References:

dbt Coalesce 2022 playlist


These are just snippets from the entire conversation which you can listen to here.

Subscribe to the Radio Data podcast to stay up-to-date with the latest technology trends and to discover the most interesting data use cases!

machine learning
dbt
ML
Modern Data Platform
startup
28 February 2023

Want more? Check our articles

getindata monitoring alert data streaming platfrorm
Use-cases/Project

How to build continuous processing for real-time data streaming platform?

Real-time data streaming platforms are tough to create and to maintain. This difficulty is caused by a huge amount of data that we have to process as…

Read more
apache2xobszar roboczy 1 4
Tutorial

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

We are  producing more and more geospatial data these days. Many companies struggle to analyze and process such data, and a lot of this data comes…

Read more
screenshot 2022 08 02 at 10.56.56
Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

Read more
getindata adam goscicki terraform cloud infrastructure notext
Tutorial

Terraform your Cloud Infrastructure

So, you have an existing infrastructure in the cloud and want to wrap it up as code in a new, shiny IaC style? Splendid! Oh… it’s spanning through two…

Read more
getindata data democratization 2

Data Democratization: Power Your Organizations with Data Accessibility

In today's digital age, data reigns supreme as the lifeblood of organizations across industries. From enabling informed decision-making to driving…

Read more
albert1obszar roboczy 1 100
Tutorial

Apache NiFi and Apache NiFi Registry on Kubernetes

Apache NiFi is a popular, big data processing engine with graphical Web UI that provides non-programmers the ability to swiftly and codelessly create…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy