Success Stories
5 min read

Truecaller - armed with data analytics to control incoming calls

Building a modern analytics environment is a strategic, long-term, iterative process of continuous improvement rather than a one-off project.

The challenge

Truecaller created a mobile app that helps identify who is calling even if you don’t have the number stored as a contact. It blocks unwanted calls & SMS, enables instant mobile payments and VoIP calls. These features are in high demand, particularly in emerging markets, as proved by 500M app installs.

Data has always been central to Truecaller’s business. The app’s spam identification feature relies on the reports from users on numbers they consider spam.Internal sources feed caller identification service.Users see ads tailored to their characteristics. App analytics help identify opportunities to provide genuine and meaningful value to its users.

trucaller-getindata-data-volumes-big-data

The solution

GetInData has assisted Truecaller in its data analytics evolution ever since implementing the first big data platform to respond to exploding data volumes in 2014. At that time Kafka and dumps from relational databases fed on-premise Cloudera Data platform, with Airflow responsible for orchestration and scheduling as well as Spark, Presto and Hive responsible for data processing.

App usage expanded further and Truecaller faced constantly increasing storage needs. They bought more hardware to get more disks even though they didn’t require more computing power. Maintaining its own data center was also challenging and the company experienced occasional downtimes.

In 2018 Truecaller decided it’s once again time to rethink their approach. After carefully considering all the available options, they decided to go for Google Cloud Platform offering. The company wanted to benefit from Cloud Storage and use DataProc for YARN compute clusters thus leveraging bare metal instances, saving costs and enabling autoscaling. Cloud Storage reduced the need for capacity planning, diminished maintenance burden, made storage access faster and turned out cheaper in comparison to on-prem HDFS.

The migration to GCP came at the cost of adjusting certain jobs to make them run in DataProc.

trucaller-getindata-cloud-journey

The next step in the cloud journey was to examine other cloud-native technologies. BigQuery turned out faster and cheaper than Hive on DataProc and offered so much better user experience thatpeople dealing with data didn’t want to work with Hive anymore. BigQuery quickly became the preferred analytics tool and Truecaller is even planning to use it for ETL processing. More complicated workload and machine learning will be run as Spark on Kubernetes.

Another advantage of GCP was the availability of cloud-native tools like Deployment Manager for infrastructure automation. It helps to deliver cloud resources faster and improves its management. Keeping resource definitions in templates as Python or Jinja code makes it suitable for CI/CD pipelines resulting in process traceability, faster delivery with infrastructure integration tests included.

Another angle to this story is the data presentation layer. Management and product owners used Tableau dashboards with analytics on users and their ways of approaching app features. With the cloud-native strategy, Data Studio became a natural choice for this purpose. It got integrated with BigQuery seamlessly, was much easier to use, serverless, and available free of charge.

The results

The cloud journey of Truecaller, supported by GetInData, required an iterative reassessment of the approach taking cloud-native and open-source technologies into account. It was full of dilemmas but eventually led to the closure of the on-premise data center and full migration from Tableau to Data Studio.

Throughout these years Truecaller managed to achieve:

6$ per 10k users of monthly cost of the data platform

developers cost constituting 30% of infrastructure cost

● managing current pipelines with only one data engineer per 42M users monthly.

To see the video presentation on Truecaller cloud journey from Big Data Technology Warsaw Summit 2020, please go here.

How-make-Data-Scientists-like-you-and-save-few-bucks-while-migrating
F.Alsadi, J.Araujo, T.Żukowski 'How to make your Data Scientists like you and save a few bucks while migrating'

big data
analytics
google cloud platform
cloud
24 June 2020

Want more? Check our articles

1 06fVzfDygMpOGKTvnlXAJQ
Tech News

Panem et circenses — how does the Netflix’s recommendation system work.

Panem et circenses can be literally translated to “bread and circuses”. This phrase, first said by Juvenal, a once well-known Roman poet is simple but…

Read more
getindata amundsen feast machine learining notext
Tutorial

Machine Learning Features discovery with Feast and Amundsen

One of the main challenges of today's Machine Learning initiatives is the need for a centralized store of high-quality data that can be reused by Data…

Read more
whitepaper data anlytics iot albert lewandowski getindata
Whitepaper

White Paper: Data Analytics for Industrial Internet of Things

About In this White Paper, we described what is the Industrial Internet of Things and what profits you can get from Data Analytics with IIoT What you…

Read more
blog1obszar roboczy 1 3
Tutorial

Power of Big Data: Science

Welcome to the next installment of the "Big Data for Business" series, in which we deal with the growing popularity of Big Data solutions in various…

Read more
blogobszar roboczy 1
Success Stories

How we built a Modern Data Platform in 4 months for Volt.io, a FinTech scale-up

Money transfers from one account to another within one second, wherever you are? Volt.io is building the world’s first global real-time payment…

Read more
getindata cover nifi ingestion kafka poc notext
Tutorial

NiFi Ingestion Blog Series. PART V - It’s fast and easy, what could possibly go wrong - one year history of certain nifi flow

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy