Success Stories
5 min read

Truecaller - armed with data analytics to control incoming calls

Building a modern analytics environment is a strategic, long-term, iterative process of continuous improvement rather than a one-off project.

The challenge

Truecaller created a mobile app that helps identify who is calling even if you don’t have the number stored as a contact. It blocks unwanted calls & SMS, enables instant mobile payments and VoIP calls. These features are in high demand, particularly in emerging markets, as proved by 500M app installs.

Data has always been central to Truecaller’s business. The app’s spam identification feature relies on the reports from users on numbers they consider spam.Internal sources feed caller identification service.Users see ads tailored to their characteristics. App analytics help identify opportunities to provide genuine and meaningful value to its users.

trucaller-getindata-data-volumes-big-data

The solution

GetInData has assisted Truecaller in its data analytics evolution ever since implementing the first big data platform to respond to exploding data volumes in 2014. At that time Kafka and dumps from relational databases fed on-premise Cloudera Data platform, with Airflow responsible for orchestration and scheduling as well as Spark, Presto and Hive responsible for data processing.

App usage expanded further and Truecaller faced constantly increasing storage needs. They bought more hardware to get more disks even though they didn’t require more computing power. Maintaining its own data center was also challenging and the company experienced occasional downtimes.

In 2018 Truecaller decided it’s once again time to rethink their approach. After carefully considering all the available options, they decided to go for Google Cloud Platform offering. The company wanted to benefit from Cloud Storage and use DataProc for YARN compute clusters thus leveraging bare metal instances, saving costs and enabling autoscaling. Cloud Storage reduced the need for capacity planning, diminished maintenance burden, made storage access faster and turned out cheaper in comparison to on-prem HDFS.

The migration to GCP came at the cost of adjusting certain jobs to make them run in DataProc.

trucaller-getindata-cloud-journey

The next step in the cloud journey was to examine other cloud-native technologies. BigQuery turned out faster and cheaper than Hive on DataProc and offered so much better user experience thatpeople dealing with data didn’t want to work with Hive anymore. BigQuery quickly became the preferred analytics tool and Truecaller is even planning to use it for ETL processing. More complicated workload and machine learning will be run as Spark on Kubernetes.

Another advantage of GCP was the availability of cloud-native tools like Deployment Manager for infrastructure automation. It helps to deliver cloud resources faster and improves its management. Keeping resource definitions in templates as Python or Jinja code makes it suitable for CI/CD pipelines resulting in process traceability, faster delivery with infrastructure integration tests included.

Another angle to this story is the data presentation layer. Management and product owners used Tableau dashboards with analytics on users and their ways of approaching app features. With the cloud-native strategy, Data Studio became a natural choice for this purpose. It got integrated with BigQuery seamlessly, was much easier to use, serverless, and available free of charge.

The results

The cloud journey of Truecaller, supported by GetInData, required an iterative reassessment of the approach taking cloud-native and open-source technologies into account. It was full of dilemmas but eventually led to the closure of the on-premise data center and full migration from Tableau to Data Studio.

Throughout these years Truecaller managed to achieve:

6$ per 10k users of monthly cost of the data platform

developers cost constituting 30% of infrastructure cost

● managing current pipelines with only one data engineer per 42M users monthly.

To see the video presentation on Truecaller cloud journey from Big Data Technology Warsaw Summit 2020, please go here.

How-make-Data-Scientists-like-you-and-save-few-bucks-while-migrating
F.Alsadi, J.Araujo, T.Żukowski 'How to make your Data Scientists like you and save a few bucks while migrating'

big data
analytics
google cloud platform
cloud
24 June 2020

Want more? Check our articles

7 popular feature stores2
Tutorial

The 7 Most Popular Feature Stores In 2023

Feature Stores are becoming increasingly popular tools in the machine learning environment, serving to manage and share the features needed to build…

Read more
obszar roboczy 12 6blog

GetInData in 2020 - our achievements and challenges in Big Data environment

The end of 2020 has come, and it's time to stop for a moment and look back. The past year was not the easiest one and presented us with many…

Read more
8e8a6167
Big Data Event

A Review of the Presentations at the DataMass Gdańsk Summit 2022

The 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for…

Read more
radiodatawilla
Radio DaTa Podcast

Data Journey with Arunabh Singh (Willa) – Building robust ML & Analytics capability very early with FinTech, skills & competencies for data scientists with ML/AI predictions for the next decades.

In this episode of the RadioData Podcast, Adama Kawa talks with Arunabh Singh about Willa use cases (​ FinTech): the most important ML models…

Read more
1712737211456
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2024! Part 1: Takeaways from Spotify, Dropbox, Ververica, Hellofresh and Agile Lab

It was epic, the 10th edition of the Big Data Tech Warsaw Summit - one of the most tech oriented data conferences in this field. Attending the Big…

Read more
flink metadata catalog
Tutorial

Flink with a metadata catalog

Have you worked with Flink SQL or Flink Table API? Do you find it frustrating to manage sources and sinks across different projects or repositories…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy