Tutorial
7 min read

Extracting Flink Flame Graph data for offline analysis

Introduction - what are Flame Graphs?

In Developer life there is a moment when the application that we create does not work as efficiently as we would expect. In cases such as these, we reach out to tools that allow us to profile our application in order to verify memory and CPU consumption. These tools are mainly memory and CPU profilers like JVisualVM, Java Mission Control and yourKit to name a few. One great addition to those are Flame Graphs. To quote the official Flame Graph doc:  Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths can be identified quickly and accurately.

A Flame Graph won't show you how much time you spend on executing the method, but instead will show you what percentage of total execution time was spent on this particular method. Equipped with this information, you can easily identify bottlenecks in your application.

It so happens that as of Flink 1.13, Flame Graphs are natively supported in Flink. However, the small problem with Flame Graphs produced by Flink is that you can't take a snapshot of Flame Graph data and analyze it later or offline in detail. This is because a Flame Graph is periodically refreshed during job execution and is no longer available after a job has finished, which creates a problem for batch jobs. In this article I will show how you can extract Flame Graph data for offline analysis from the Flink UI.

Enabling Flink Flame Graphs

To enable Flame Graphs, the rest.flamegraph.enabled option in conf/flink-conf.yaml on the Job Manager node has to be set to true. After enabling this option, a new tab titled “FlameGraph” will be available on the Flink UI. In order to produce a Flame Graph, you have to navigate to the job graph of a running job, select an operator of interest and in the menu to the right, click on the Flame Graph tab. For our example, we will be looking at On-CPU mode only.

flame-graph-operator-getindata
Source: https://nightlies.apache.org/flink/flink-docs-release-1.17/fig/flame_graph_operator.png

After clicking the “FlameGraph” tab, the system will start to collect stack samples. After some time, the Flame Graph image will appear on the screen.

Acquiring Flame Graph data

As a test job, we will use a simple Flink streaming Job that processes a stream of events expressed as an “Order” entity. Apply anonymization on party information, then create or assign the already created session ID. Later, orders are printed in cluster logs. The source code of this job can be found here.

This is the same project that we used in our previous tech blog titled Writing Flink jobs using Spring dependency injection framework. If you haven't read that one, make sure you do. It demonstrates how the Spring framework can be used for building your Flink Jobs.

Now, let's get back to Flink Flame Graphs.After the Flink job is submitted to the Flink cluster and successfully initialized, navigate to the job graph and select the “Process → Sink” box, then select the “FlameGraph” tab.

process-sink-getindata

After some time, measurement results should appear on the screen.

Now the hacky part. When you manage to capture the checkpoint data, make sure that you see the full flame graph data –> root bar on the bottom.
Then:

  • Open browser dev tools (Google Chrome and MS Edge - F12)
  • Navigate to the “network tab” and look for “flamegraph” REST request:

network-tab-getindata-flink

After finding the request, double click on it. The request response should be presented in the browser.

request-response-getindata

Select the entire content, then save it as a json file. What we just saved is the Flame Graph data gathered by Flink backend, preprocessed and sent to the Flink UI.

Creating the Flame Graph from Flink

Now, when we have extracted raw Flame Graph data from Flink backend, we can use some JavaScript magic and create the graph. To this end,  we will use the d3-flame-graph library.Using this library and examples from it, I’ve created a very simple page that allows you to browse exported data. To get the code, visit the PlotFlinkFlameGraphs repository where you will find the index.html.When you open the index.html in your web browser, you will see a simple UI. Click the “Choose file” button on the bottom left and select the json file created in the previous chapter. At the end you should see something similar to this:

ui-graph-getindata

You can click on individual tiles to expand them. Click the “Reset zoom” button to return to the original view. If you wish to load new data, simply click again on the “Choose File” button at the bottom left.

Voilà, now you have a Flame Graph from Flink data that you can analyze offline.

You can play with this brilliant tool using sample data uploaded to the repository.
Can you spot what might be causing an issue in this one? Let us know in the comments ;).

Conclusion

Starting from Flink 1.13, we have access to a great tool for debugging Flink Job bottlenecks which is Flame Graphs. In this blog post I have shown you how you can extract data from the Flink UI and plot a Flame Graph from it for offline analysis. This solves the problem, with the Flink Flame Graph being updated during Job execution or even being no longer available after a job terminates. With our nifty javascript code, based on the d3-flame-graph library you can analyze this data whenever you want.

Want more useful tips and articles?

Sign up to our newsletter! Only knowledge base content.

The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy
flink
Flame Graphs
Flink Flame Graphs
12 June 2023

Want more? Check our articles

lean big data 1
Tutorial

Lean Big Data - How to avoid wasting money with Big Data technologies and get some ROI

During my 6-year Hadoop adventure, I had an opportunity to work with Big Data technologies at several companies ranging from fast-growing startups (e…

Read more
kubeflow pipelines runing 5 minutes getindata blog

Kubeflow Pipelines up and running in 5 minutes

The Kubeflow Pipelines project has been growing in popularity in recent years. It's getting more prominent due to its capabilities - you can…

Read more
power of bigdata
Tutorial

Power of Big Data: Marketing

In the "Power of Big Data" series, I will talk about the possibilities that Big Data solutions give to individual business sectors. It should be noted…

Read more
1 06fVzfDygMpOGKTvnlXAJQ
Tech News

Panem et circenses — how does the Netflix’s recommendation system work.

Panem et circenses can be literally translated to “bread and circuses”. This phrase, first said by Juvenal, a once well-known Roman poet is simple but…

Read more
llm data enrichment bigqueryobszar roboczy 1 4
Tutorial

How to use LLMs for data enrichment in BigQuery?

Introduction In the ever-evolving world of data analytics, businesses are continuously seeking innovative methods to unlock hidden value from their…

Read more
18nX38qlhR2rMM2cQzZ0U3A
Use-cases/Project

How to build Digital Marketing Platform making the best out of Google Cloud

Nowadays digital marketing is a competitive business and it’s easy to tell that we are way past the point when a catchy slogan or shiny banner would…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy