6 min read

Alert backoff with Flink CEP

Flink complex event processing (CEP)....

....provides an amazing API for matching patterns within streams. It was introduced in 2016 with an interesting blog post presenting CEP usage scenarios for monitoring and alert detection. When implementing it in real-life, one may find an important missing feature - backoff. Once an alert is triggered, we do not want more of the same alerts. As an example, if you check for low disk usage every minute, when an alert is raised next checks should not trigger an alert for a specified interval like an hour or a day. 

In this post, we present how to implement backoff in Flink CEP. The same functionality can be achieved with Flink DataStream API or Pattern Recognition SQL within the MATCH_RECOGNIZE clause. Please refer to our other Flink post to get more information on available Flink APIs and comparison between them. Scenario described in the post is a real case study from one of our customers and we decided to use CEP in order to be able to easily extend it further when pattern matching requirements get more complex. 

We start with creating a simple Flink CEP logic that matches a pattern that we consider to become an alert. In our scenario this was as simple as filtering some specific events. Testing is our friend from the first line of code, thus we can start with a test that checks if a filtering logic works as expected.

Later on we extend our test base with the following scenarios: 

  • Create multiple events that trigger an alert and check a single alert is triggered.
  • Create multiple alerts that exceed the backoff window and assert more than one alert is triggered.

We are going to use CEP’s SKIP_PAST_LAST_EVENT after match strategy that controls the number of matches a single event will be assigned to. According to CEP documentation, this works in a way that for a pattern b+ c and a data stream b1 b2 b3 c, only a pattern b1 b2 b3 c will be returned. 

In other words, this assures that if an event belongs to a match, it cannot belong to any other match until this one ends. That’s something we are looking for and we just need a pattern match for a whole backoff time (let’s say 24 hours). 

In order to do that, let us create a Flink stream (Flink abstraction and not real Kafka topic) that for each event that should trigger an alert creates an additional event with a field PATTERN_END and event time delayed for 24 hours. This can be done with a code below: 

Creating of a Complex Event Processing with Flink
Adding Fliink stream

Please note that we try to avoid duplicating the whole stream twice. That is why we do the filtering first so that only a fraction of a stream is duplicated. Another possible approach, when having events of a significant size, is to create a new stream with events containing only the necessary fields to trigger the alert. 

Now we need to create a pattern that

  • Uses SKIP_PAST_LAST_EVENT strategy,
  • Starts the pattern with element that is not PATTERN_END,
  • Pattern starting events are followed by PATTERN_END events, 
  • Event time difference between the first and last event in the pattern is exactly 24 hours.

The event time difference is important when multiple alert events occur and it is hard to match PATTERN_END event to the corresponding PATTERN_BEGIN event (see on the picture below: the red PATTERN_BEGIN event and the yellow PATTERN_END event).

CEP with Flink patterns

The requirements defined above can be achieved with the following code

Flink Complex Event Processing Platform
Data Stream API

In the example above, event filtering is done within DataStream API and not within a pattern definition. This allows us to keep duplicated stream downsized to minimum. In other scenarios it may be useful to include filtering within Pattern definition, as it enables richer API to filter among multiple events’ patterns.

In many cases, it is desired to work with multiple alert types. Imagine our event has a field customer and we want to get separate alerts for each customer. In this case, we need to define keyBy function on the stream. The code below puts all the things together.

Complex Events Processing Platform with using of FLink

Please do remember to make Flink streams rely on an event time:


During development, extra filtering has been added to make sure a time interval between first and last pattern element is 24 hours. This was introduced after a failing test. On one hand Flink CEP is a great API to solve complex problems with a minimum amount of code. On the other hand, this can be error prone. That is why tests and test driven development should be your best friend when working with Complex Event Processing.

big data
apache flink
data discovery
data stream platform
28 July 2021

Want more? Check our articles

getindata running machine learning platform pipelines kedro kubeflow airflow mariusz strzelecki

Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow

One of the biggest challenges of today’s Machine Learning world is the lack of standardization when it comes to models training. We all know that data…

Read more

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast…

Read more
obszar roboczy 12 6blog

GetInData in 2020 - our achievements and challenges in Big Data environment

The end of 2020 has come, and it's time to stop for a moment and look back. The past year was not the easiest one and presented us with many…

Read more
Big Data Event

A Review of the Presentations at the DataMass Gdańsk Summit 2022

The 4th edition of DataMass, and the first one we have had the pleasure of co-organizing, is behind us. We would like to thank all the speakers for…

Read more
data enrichtment flink sql using http connector flink getindata big data blog notext

Data Enrichment in Flink SQL using HTTP Connector For Flink - Part Two

In part one of this blog post series, we have presented a business use case which inspired us to create an HTTP connector for Flink SQL. The use case…

Read more
obszar roboczy 12 23blogcdci

Different generations of CICD tools

What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment. CICD can be also described as the methodology focused on…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy