Tutorial
9 min read

Different generations of CICD tools

What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment.

CICD can be also described as the methodology focused on automating tasks related to running tests, build phases, deploying to development or production and also carrying out some A/B tests in a fully automated fashion. It became a must in the rapidly changing world in which we need to deploy changes frequently. The old-fashioned maintenance means would be tough, especially for a complex data streaming platform, and CICD tools are the only way to make this manageable. 

CI regards  the automation process for developers while the CD is about delivering new releases to the target environment

Complex Event Processing with CI/CD
Continuous Integration, Delivery and Deployment

Source

Everything started in 2006 when Martin Fowler wrote an article about the need of having Continuous Integration to reduce the number of bugs and detecting them quickly. The first two important players were CruiseControl (released in 2001) and Hudson CI (later Jenkins) released in 2005. 2008 was the year when Hudson became more and more important, in 2010 Sun Microsystems, in which Hudson’s author, Kohsuke Kawaguchi worked, was purchased by Oracle which wanted to commercialize the open source Hudson. Jenkins has been with us in its open source form since 2011. During this time there were a lot of other competitors but we can chose what we felt was best for us. 

The later, the simpler

CI/CD tools comparision
CI/CD waves

Source

Currently, we can see a trend causing more and more developers to learn how to deploy their applications, run tests using GitOps  and follow the best practices. At the beginning of the CICD store, setting up the environment, writing scripts or deployment files used to be be a bit complex, but now it has become pretty easy and GitLab CI, the latest version of Jenkins or GitHub Action are the best examples of tools used to achieve this. Due to the popularity of DevOps principles (which, in fact, contain a lot of definitions related to Ops phrases), CICD tools must become easier to use  and faster in daily use.

Jenkins needs to be installed on the dedicated infrastructure, writing new pipelines in, for example, Groovywhich in fact can be quite complex to manage. . Jenkins is one of the authors used to  implement CICD, but it’s been a player since the first generation of CICD tools, talking about the  first CICD phase.

The next ones are GitLab CI, Circle CI and Travis CI,which deliver Pipeline as Code. It is required to write some simple scripts but thanks to managed runners, building CICD pipelines doesn’t take too much time and they are great examples of progress in this market.

The last generation is about building pipelines from prepared blocks the same way as  in GitHub Actions. Here we have modules managed by the community (that can be also a disadvantage for some developers) which we can easily mix with each other and build a CICD pipeline even faster and in a simpler way. We can predict that the next step is topopularize more drag&drop solutions (with Web UI or in a similar way to the current GitHub Actions).

What is important in the CICD tool?

When talking  about a real-time data streaming platform processing millions of events per second, it is a must to have a smoothly working CICD platform. 

Managing rules are a must in the case of CICD tools and fortunately all of them support it. Thismeans that we can set up the following rule: when merging from staging to prod, run tests, build app or build model, run integration tests and deploy to production. 

For some use cases, it is important to have the chance to install the CICD tool in the “local” infrastructure within on-premise, or in the public cloud. Here Jenkins and GitLab (with its CI) exhibit extensive capabilities. Even during deployment and when adding all components around each service, you can discover the difference between the old Jenkins and much newer, robust GitLab. Here we also need to mention  ease of deployment. In the case of GitLab CI and GitHub Actions, we need to create a repo and start writing the CICD pipeline by creating the files (.gitab-ci.yml or YAML within .github/workflows, for example) while in the case of Jenkins we would need to set up the whole platform which would obviously be quite time consuming.

When talking about mixed deployment (cloud repository with own runners used for running the CICD pipeline), we have multiple configurations for GitLab CI (we can use bare-metal server, Virtual Machine, Docker or Kubernetes) and a bit more limited options for GitHub Actions. Why is it important to mention this ? First of all, it may be more cost-effective  in comparison to buying more minutes for managed runners. Secondly, it’s more secure to use its own infrastructure to run tests, build and push images with applied network security groups directly to the runners.

The next thing is the size of the Operational team that will handle CICD pipelines. It’s also worth checking how we can integrate the CICD tool into our target environment - tools like Flux CD are useful nowadays when we need to handle managing multiple platforms.

In the case of making complex pipelines, we should also check how we can manage all secrets, variables, and tokens which are required within our pipelines.

How many steps should/can we automate?

When talking about CICD tools, it’s worth mentioning about the CICD pipelines themselves.

The first is about creating separate stages of the pipeline. Imagine we have a Spark application written in Scala, that we want to deploy to our Kubernetes cluster and we have three environments: development, QA and production.

The second one is about automating running unit tests or validation tests (in case of YAML scripts, for example).

The third one is about the building process, where we can store artifacts from the job and how we want to build it (once per environment or once for all environments?).

The fourth is focused on deployment  to development, staging, QA and the different environments for testing the code and verifying if it works as expected.

Deployment to the production environment comes as the fifth component and here it’s important to define some A/B testing scenarios, running Flink jobs in the incubation mode (and here verify results once again), performing  blue-green deployment and making it fully automated or not. In some cases, it makes sense to only use manual triggers to update the release in the production environment.

The last part is about managing all secrets used within the pipeline.

KPIs for CICD

  • It’s always good to have a fully observable environment, application and the same rule applies with CICD pipelines. We can create some KPIs for them to monitor their effectiveness.

Cycle time

  • The best description of this metric is How long does it take to deliver the results of the job? It describes how much time each pipeline stage takes to complete.

Mean Time Between Failures and Mean Time To Recover

  • This shows how fast we can perform a rollback of deployment in the case of any failure and how robust our solutions are.

Deploy Ready Builds

  • It describes how many pipelines finish with success - it's a useful metric to verify if all added tests and verification steps work as expected.

Infrastructure Costs and Uptime

  • The first measurement is about the costs of the underlying infrastructure for the CICD, running tests and verification of whether the deployment was completed successfully. It can also be connected to the uptime of the platform. Based on this, we can create new enhancements to improve the performance, stability and the speed of deployment.

Comparison between Jenkins, GitLab CI and GitHub Actions

  • As I used these three tools the most, I can present the multiple differences between them to you, along with their limitations and unique advantages that can make one of them the right choice for you.

Jenkins is the oldest one and it’s not worth using if we want to use something modern and we do not have a technological debt with Jenkins used in all pipelines in the company.

GitLab with its CI is a reasonable solution when we want to have a tool that's easy to use with a lot of advanced features and great integration with all that public cloud provides and also on-premises.

GitHub Actions is the right choice if you don’t have too much time to spend on  writing and defining CICD pipelines. You want to have a complex CICD pipeline which can be written in a few moments. Here GitHub Actions appears to be the  hero, especially for some open source projects hosted there.

CICD for Modern MLOps platform

Complex Event Processing Platform and MLOps Platform with CICD
CICD tools comaprision

Modern Data Streaming and/or AI platforms contain multiple microservices and delivering new releases in a smooth, efficient  way requires using the right CICD tools and well built CICD pipelines. Based on our experience in GetInData projects, we can highly recommend using GitLab CI as the flexible solution for on-premise and SaaS infrastructure.

big data
CEP
stream processing
Gitlab
CI/CD
21 July 2021

Want more? Check our articles

dynamicsqlprocessingwithapacheflinkobszar roboczy 1 4
Tutorial

Dynamic SQL processing with Apache Flink

In this blog post, I would like to cover the hidden possibilities of dynamic SQL processing using the current Flink implementation. I will showcase a…

Read more
propozycja2
Tutorial

Deploying efficient Kedro pipelines on GCP Composer / Airflow with node grouping & MLflow

Airflow is a commonly used orchestrator that helps you schedule, run and monitor all kinds of workflows. Thanks to Python, it offers lots of freedom…

Read more
dsc3210
Big Data Event

A Review of the Big Data Technology Warsaw Summit 2022! Part 2. Top 3 best-rated presentations

The 8th edition of the Big Data Tech Summit left us wondering about the trends and changes in Big Data, which clearly resonated in many presentations…

Read more
llm cluster hugging face gke autopilot getindataobszar roboczy 1 4
Tutorial

Deploy open source LLM in your private cluster with Hugging Face and GKE Autopilot

Deploying Language Model (LLMs) based applications can present numerous challenges, particularly when it comes to privacy, reliability and ease of…

Read more
5apacheobszar roboczy 1 4
Tutorial

Real-time ingestion to Iceberg with Kafka Connect - Apache Iceberg Sink

What is Apache Iceberg? Apache Iceberg is an open table format for huge analytics datasets which can be used with commonly-used big data processing…

Read more
1sK7ModpT4v02ujZ379Samg
Tech News

Celebrating GetinData’s Inclusion on Clutch’s Lists of Top Big Data and IoT Companies!

Founded by former Spotify data engineers in 2014, GetInData consists of a team of experienced and passionate Big Data veterans with proven track of…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy