Success Stories
5 min read

Truecaller - armed with data analytics to control incoming calls

Building a modern analytics environment is a strategic, long-term, iterative process of continuous improvement rather than a one-off project.

The challenge

Truecaller created a mobile app that helps identify who is calling even if you don’t have the number stored as a contact. It blocks unwanted calls & SMS, enables instant mobile payments and VoIP calls. These features are in high demand, particularly in emerging markets, as proved by 500M app installs.

Data has always been central to Truecaller’s business. The app’s spam identification feature relies on the reports from users on numbers they consider spam.Internal sources feed caller identification service.Users see ads tailored to their characteristics. App analytics help identify opportunities to provide genuine and meaningful value to its users.

trucaller-getindata-data-volumes-big-data

The solution

GetInData has assisted Truecaller in its data analytics evolution ever since implementing the first big data platform to respond to exploding data volumes in 2014. At that time Kafka and dumps from relational databases fed on-premise Cloudera Data platform, with Airflow responsible for orchestration and scheduling as well as Spark, Presto and Hive responsible for data processing.

App usage expanded further and Truecaller faced constantly increasing storage needs. They bought more hardware to get more disks even though they didn’t require more computing power. Maintaining its own data center was also challenging and the company experienced occasional downtimes.

In 2018 Truecaller decided it’s once again time to rethink their approach. After carefully considering all the available options, they decided to go for Google Cloud Platform offering. The company wanted to benefit from Cloud Storage and use DataProc for YARN compute clusters thus leveraging bare metal instances, saving costs and enabling autoscaling. Cloud Storage reduced the need for capacity planning, diminished maintenance burden, made storage access faster and turned out cheaper in comparison to on-prem HDFS.

The migration to GCP came at the cost of adjusting certain jobs to make them run in DataProc.

trucaller-getindata-cloud-journey

The next step in the cloud journey was to examine other cloud-native technologies. BigQuery turned out faster and cheaper than Hive on DataProc and offered so much better user experience thatpeople dealing with data didn’t want to work with Hive anymore. BigQuery quickly became the preferred analytics tool and Truecaller is even planning to use it for ETL processing. More complicated workload and machine learning will be run as Spark on Kubernetes.

Another advantage of GCP was the availability of cloud-native tools like Deployment Manager for infrastructure automation. It helps to deliver cloud resources faster and improves its management. Keeping resource definitions in templates as Python or Jinja code makes it suitable for CI/CD pipelines resulting in process traceability, faster delivery with infrastructure integration tests included.

Another angle to this story is the data presentation layer. Management and product owners used Tableau dashboards with analytics on users and their ways of approaching app features. With the cloud-native strategy, Data Studio became a natural choice for this purpose. It got integrated with BigQuery seamlessly, was much easier to use, serverless, and available free of charge.

The results

The cloud journey of Truecaller, supported by GetInData, required an iterative reassessment of the approach taking cloud-native and open-source technologies into account. It was full of dilemmas but eventually led to the closure of the on-premise data center and full migration from Tableau to Data Studio.

Throughout these years Truecaller managed to achieve:

6$ per 10k users of monthly cost of the data platform

developers cost constituting 30% of infrastructure cost

● managing current pipelines with only one data engineer per 42M users monthly.

To see the video presentation on Truecaller cloud journey from Big Data Technology Warsaw Summit 2020, please go here.

How-make-Data-Scientists-like-you-and-save-few-bucks-while-migrating
F.Alsadi, J.Araujo, T.Żukowski 'How to make your Data Scientists like you and save a few bucks while migrating'

big data
analytics
google cloud platform
cloud
24 June 2020

Want more? Check our articles

real time reporting cover getindata
Tutorial

Real-Time Customer-Facing Reporting - Why Showing Users Data Sooner Rather than Later is Better

In today's fast-paced business environment, companies are increasingly turning to real-time data to gain a competitive edge. One of the examples are…

Read more
dsc3210
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2022! Part 2. Top 3 best-rated presentations

The 8th edition of the Big Data Tech Summit left us wondering about the trends and changes in Big Data, which clearly resonated in many presentations…

Read more
screenshot 2022 08 02 at 10.56.56
Tech News

2022 Big Data Trends: Retail and eCommerce become one of the hottest sectors for AI/ML

Nowadays, we can see that AI/ML is visible everywhere, including advertising, healthcare, education, finance, automotive, public transport…

Read more
paweł lesszczyński 2obszar roboczy 1 4x 100
Tutorial

Alert backoff with Flink CEP

Flink complex event processing (CEP).... ....provides an amazing API for matching patterns within streams. It was introduced in 2016 with an…

Read more
geospatial analytics hadoop
Use-cases/Project

Geospatial analytics on Hadoop

A few months ago I was working on a project with a lot of geospatial data. Data was stored in HDFS, easily accessible through Hive. One of the tasks…

Read more
1 RsDrT5xOpdAcpehomqlOPg
Big Data Event

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Big Data Technology Warsaw Summit 2020 is fast approaching. This will be 6th edition of the conference that is jointly organised by Evention and…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy