Success Stories

9 min read

Streaming Analytics for the Digital Asset Risk Management System - Cloudwall Success Story

How to minimize data processing latency to hundreds of milliseconds when dealing with 100 mln messages per hour? How can data quality be secure and the infrastructure costs optimized? Why was Apache Flink a main technology chosen when designing the streaming analytics system architecture? Check how we built the streaming platform that delivers real-time market data and allows users to easily and quickly respond to data issues.

About customer

Cloudwall is a risk platform technology provider for digital assets based in NYC and Singapore. The flagship product, Serenity, is a cloud-based portfolio risk platform for digital asset hedge funds, prop trading firms, OTC desks and anyone managing a diverse portfolio of digital assets.

Challenge / Real-Time Trade Platform

Cloudwall built a platform for digital asset risk and analytics. The product is dedicated for institutional investors and is designed to help them make decisions based on their portfolios and market events. The client wanted to process market data feeds for token trades & prices, both spot and derivatives instruments in real-time. What they didn’t want to do was to integrate directly with crypto exchanges due to complexity and time to market, so they decided to use different trade data providers.

The challenge we faced in this project was that the client didn’t have control over the quality of the data and problems occurred when there was no data or they were not full because of some issue with the providers.

To meet the client’s expectations, we needed to build a real-time platform with low latency that would be able to handle around 100 mln messages per hour and allow them to quickly react to the data issues. One of the important requirements was that the client also wanted to reduce cloud costs as much as possible.

Solution

In the first phase of the project we designed and implemented the streaming platform architecture. The main functionality of the platform was to collect, enhance and unify data from the providers, store them in Kafka and recalculate them to the fair prices.

The solution was designed to deliver the values such as:

High throughput - the platform is able to handle more than 100 mln messages per hour and in can handle much more
Versatility - the steaming platform's main objective is to track cryptocurrency prices in real-time for seamless communication with users
Best open-source software - Apache Flink, the leading open-source framework for real-time applications, ensures cutting-edge performance and reliability.
Configurability and Scalability - it allows us to quickly add additional cryptocurrencies to the analysis without job downtime
Cost-effectiveness - thanks to the optimization we were able to cut the ingress/egress and processing cost for messaging and Kubernetes cluster
Data Enrichment - Data caching for high data performance and low latency, triggered by external signals (Redis), and dynamic job reconfiguration, minimizes external database communication and ensures data consistency in concurrent Flink jobs

The solution looks like this:

streaming-analytics-platform-getindata-cloudwall

The streaming platform we have built is based on the Apache Flink solution. One of the main roles of the platform is the integration with data providers. This integration is available thanks to the different protocols such as gRPC and WebSockets.

When building the architecture, we implemented the Flink source. Subscriptions have been efficiently partitioned, enabling simultaneous message consumption.

One of features of the system is the dynamic reconfiguration of subscriptions during job execution, allowing for flexible adaptation to changing conditions.

We optimized the performance of serialization and deserialization operations by converting them to an internal message format based on Flink Tuple. This unified format contributes to the reusability of functions, thereby enhancing code quality and facilitating code maintenance.

The orchestration and deployment of jobs have been implemented using the Flink Kubernetes Operator. This operator enables the management of both stateful and stateless jobs, as well as collaboration with the latest versions of Flink.

Additionally, the utilization of the Prometheus Operator for pulling metrics from Flink, along with built-in support for the Prometheus Reporter, ensures effective analysis and monitoring of the system's performance. Ultimately, the layer for visualizing metrics and detecting anomalies has been realized using Grafana, creating a comprehensive environment for managing and monitoring the Flink-based architecture.

Streaming Monitoring System

GetinData has offered Cloudwall the streaming monitoring system based on Flink to address data-related issues. The monitoring system presents the current state of the system with all important metrics - both technical and business ones. Flink provides technical metrics, including processed record count, data volume, job up-time, restart count, memory usage, and more. Additionally, functions can provide supplementary information, such as Kafka sink measuring the number of executed transactions, errors, retries, and others.

In this case data monitoring required adding business metrics that were dynamically labeled (metric groups in Flink). This allowed us to monitor data with different data granularities, such as the throughput of messages related to a single instrument, an entire class of data types on the exchange, or the entire exchange itself. Utilizing monitoring based on unified metrics enabled comparative monitoring of providers and automated detection of data anomalies.

The metrics we monitor are collected in the Prometheus database and are available for visualization and monitoring in Grafana. What is also important, metrics created with dynamic labels enable various functionalities, including:

Creating dynamically calculated filters based on labels (dashboard variables).
Utilizing filters to browse selected data on charts.
The ability to aggregate metric values at different data granularity levels.
Building multidimensional alert rules (one condition calculated for each value separately), enabling convenient tracking and management of such alerts in Grafana.

The Streaming Monitoring System we have created allows users to analyze and monitor the data without requiring high-level knowledge of cryptocurrency. We can react to events such as no data from the market or a significant difference in prices in real time.

Use-case: Price monitoring

The Streaming Monitoring System allows us to monitor price based on the metrics. Below you can see the prices from the suppliers, which are quite similar. The charts show minor discrepancies, which may result from low metric resolution, delivery delays, or data processing.

minor-discrepancies-getindata-1

When we go to the Price ratio, we are able to see a single discrepancy reaching approximately to a 6% difference in price. In the picture below you can see the peak extending beyond the green line. In this case, it is not an alarming situation. Maintaining such a discrepancy over an extended period would indicate a difference in prices between suppliers. A single change could result from factors such as data processing delays by suppliers or relatively low metric refresh rates (several tens of seconds).

minor-discrepancies-getindata-2

This use-case shows that monitoring only one aspect, for example, the instrument's price, does not provide a complete picture of potential issues. For this reason, it is necessary to expose multiple metrics, including data latency, refresh frequencies (market update rate), a summary of active subscriptions, and more. Building a comprehensive system has enabled, among other things, the detection of data errors or gaps (at the instrument level, data class level, or across the entire exchange).

Infrastructure cost optimization

In the case of the infrastructure capabilities, the most important factor wasn’t the volume of data it could process but the cost it generated. We developed a fully scalable streaming platform, but we had to take care of the cost. We optimized the platform’s cost through three key elements:

Load Balancer ingress
Data providers offer data subscriptions at various levels. Using data at the lowest level provides great flexibility in data processing but comes with significant data transfer costs, affecting the Load Balancer expenses. To reduce these costs, we decided to use the data from a higher level, striking a balance between flexibility, data quality, and size. Of course, when transitioning to a higher level, it becomes crucial to conduct comparative analysis and collaborate with data providers to eliminate any potential issues.
Event Hub (throughput unit)
The amount of data being written is directly proportional to the data acquired from the data provider. Reducing the consumption of throughput units became possible through preliminary data aggregation within Flink. However, this comes at the expense of data latency.
Kubernetes optimization
We reduced the resource consumption on Kubernetes by optimizing and limiting data serialization in terms of CPU usage. We also merged jobs by grouping them per data providers with multiple independent pipelines. This consolidation reduced the overall resources utilization and lowered the Kubernetes cluster costs.

quote cloudwall getindata

Results

The project aimed to build an architecture that would provide greater control and faster response to changes or anomalies in incoming data. As a result, the solution we created:

Reconfigured job subscriptions without downtime. The reconfiguration process is triggered by a message published on (Redis) channel.
Enables a rapid response when incoming data is incorrect or missing, thanks to alerts.
Optimizes costs associated with data flow and cluster maintenance.
Allows for the inclusion of business metrics
Is easy to maintain
Has simple architecture is fast and scalable

Do you want to build a Streaming Analytics Platform? Do not hesitate to contact us or sign up for a free consultation to discuss your data streaming needs.

streaming

apache flink

streaming monitoring

streaming analytics

Last updated: 19 February 2024

Written by

Klaudia Wachnio

Head of Marketing and Employer Branding

Maciej Maciejko

Senior Data Engineer

Like this post?
Spread the word

Want more? Check our articles

Big Data Event

How we evaluate the CfP submissions and build the conference agenda at Big Data Technology Warsaw Summit

Big Data Technology Warsaw Summit 2021 is fast approaching. Please save the date - February 25th, 2021. This time the conference will be organized as…

4 pragmatic enablers data driven decision making notext

Use-cases/Project

4 pragmatic enablers of data-driven decision making

You could talk about what makes companies data-driven for hours. Fortunately, as a single picture is worth a thousand words, we can also use an…

5 main data-related trends to be covered at Big Data Tech Warsaw 2021 Part II

Trend 4. Larger clouds over the Big Data landscape A decade ago, only a few companies ran their Big Data infrastructure and pipelines in the public…

getindator beautiful magi lake with data visualization under th 04d517e5 6cb7 49b2 af1a 77884a44a1eb

Tutorial

Data lakehouse with Snowflake Iceberg tables - introduction

Snowflake has officially entered the world of Data Lakehouses! What is a data lakehouse, where would such solutions be a perfect fit and how could…

heroes 3 how to build real company culture

Heroes 3: Office Wars - How to build real company culture

Communication. Respect. Integrity. Innovation. Strive for greatness. Passion. People. Stirring words, right? Let me share a tale with you about how we…

getindator design a vibrant and engaging scene showcasing real 76ab8269 a013 4120 b722 f95e879d333c

Tutorial

Stream enrichment with Flink SQL

In today's world, real-time data processing is essential for businesses that want to remain competitive and responsive. The ability to obtain results…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com