Tutorial

9 min read

Different generations of CICD tools

What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment.

CICD can be also described as the methodology focused on automating tasks related to running tests, build phases, deploying to development or production and also carrying out some A/B tests in a fully automated fashion. It became a must in the rapidly changing world in which we need to deploy changes frequently. The old-fashioned maintenance means would be tough, especially for a complex data streaming platform, and CICD tools are the only way to make this manageable.

CI regards the automation process for developers while the CD is about delivering new releases to the target environment

Complex Event Processing with CI/CD — Continuous Integration, Delivery and Deployment

Source

Everything started in 2006 when Martin Fowler wrote an article about the need of having Continuous Integration to reduce the number of bugs and detecting them quickly. The first two important players were CruiseControl (released in 2001) and Hudson CI (later Jenkins) released in 2005. 2008 was the year when Hudson became more and more important, in 2010 Sun Microsystems, in which Hudson’s author, Kohsuke Kawaguchi worked, was purchased by Oracle which wanted to commercialize the open source Hudson. Jenkins has been with us in its open source form since 2011. During this time there were a lot of other competitors but we can chose what we felt was best for us.

The later, the simpler

Source

Currently, we can see a trend causing more and more developers to learn how to deploy their applications, run tests using GitOps and follow the best practices. At the beginning of the CICD store, setting up the environment, writing scripts or deployment files used to be be a bit complex, but now it has become pretty easy and GitLab CI, the latest version of Jenkins or GitHub Action are the best examples of tools used to achieve this. Due to the popularity of DevOps principles (which, in fact, contain a lot of definitions related to Ops phrases), CICD tools must become easier to use and faster in daily use.

Jenkins needs to be installed on the dedicated infrastructure, writing new pipelines in, for example, Groovywhich in fact can be quite complex to manage. . Jenkins is one of the authors used to implement CICD, but it’s been a player since the first generation of CICD tools, talking about the first CICD phase.

The next ones are GitLab CI, Circle CI and Travis CI,which deliver Pipeline as Code. It is required to write some simple scripts but thanks to managed runners, building CICD pipelines doesn’t take too much time and they are great examples of progress in this market.

The last generation is about building pipelines from prepared blocks the same way as in GitHub Actions. Here we have modules managed by the community (that can be also a disadvantage for some developers) which we can easily mix with each other and build a CICD pipeline even faster and in a simpler way. We can predict that the next step is topopularize more drag&drop solutions (with Web UI or in a similar way to the current GitHub Actions).

What is important in the CICD tool?

When talking about a real-time data streaming platform processing millions of events per second, it is a must to have a smoothly working CICD platform.

Managing rules are a must in the case of CICD tools and fortunately all of them support it. Thismeans that we can set up the following rule: when merging from staging to prod, run tests, build app or build model, run integration tests and deploy to production.

For some use cases, it is important to have the chance to install the CICD tool in the “local” infrastructure within on-premise, or in the public cloud. Here Jenkins and GitLab (with its CI) exhibit extensive capabilities. Even during deployment and when adding all components around each service, you can discover the difference between the old Jenkins and much newer, robust GitLab. Here we also need to mention ease of deployment. In the case of GitLab CI and GitHub Actions, we need to create a repo and start writing the CICD pipeline by creating the files (.gitab-ci.yml or YAML within .github/workflows, for example) while in the case of Jenkins we would need to set up the whole platform which would obviously be quite time consuming.

When talking about mixed deployment (cloud repository with own runners used for running the CICD pipeline), we have multiple configurations for GitLab CI (we can use bare-metal server, Virtual Machine, Docker or Kubernetes) and a bit more limited options for GitHub Actions. Why is it important to mention this ? First of all, it may be more cost-effective in comparison to buying more minutes for managed runners. Secondly, it’s more secure to use its own infrastructure to run tests, build and push images with applied network security groups directly to the runners.

The next thing is the size of the Operational team that will handle CICD pipelines. It’s also worth checking how we can integrate the CICD tool into our target environment - tools like Flux CD are useful nowadays when we need to handle managing multiple platforms.

In the case of making complex pipelines, we should also check how we can manage all secrets, variables, and tokens which are required within our pipelines.

How many steps should/can we automate?

When talking about CICD tools, it’s worth mentioning about the CICD pipelines themselves.

The first is about creating separate stages of the pipeline. Imagine we have a Spark application written in Scala, that we want to deploy to our Kubernetes cluster and we have three environments: development, QA and production.

The second one is about automating running unit tests or validation tests (in case of YAML scripts, for example).

The third one is about the building process, where we can store artifacts from the job and how we want to build it (once per environment or once for all environments?).

The fourth is focused on deployment to development, staging, QA and the different environments for testing the code and verifying if it works as expected.

Deployment to the production environment comes as the fifth component and here it’s important to define some A/B testing scenarios, running Flink jobs in the incubation mode (and here verify results once again), performing blue-green deployment and making it fully automated or not. In some cases, it makes sense to only use manual triggers to update the release in the production environment.

The last part is about managing all secrets used within the pipeline.

KPIs for CICD

It’s always good to have a fully observable environment, application and the same rule applies with CICD pipelines. We can create some KPIs for them to monitor their effectiveness.

Cycle time

The best description of this metric is How long does it take to deliver the results of the job? It describes how much time each pipeline stage takes to complete.

Mean Time Between Failures and Mean Time To Recover

This shows how fast we can perform a rollback of deployment in the case of any failure and how robust our solutions are.

Deploy Ready Builds

It describes how many pipelines finish with success - it's a useful metric to verify if all added tests and verification steps work as expected.

Infrastructure Costs and Uptime

The first measurement is about the costs of the underlying infrastructure for the CICD, running tests and verification of whether the deployment was completed successfully. It can also be connected to the uptime of the platform. Based on this, we can create new enhancements to improve the performance, stability and the speed of deployment.

Comparison between Jenkins, GitLab CI and GitHub Actions

As I used these three tools the most, I can present the multiple differences between them to you, along with their limitations and unique advantages that can make one of them the right choice for you.

Jenkins is the oldest one and it’s not worth using if we want to use something modern and we do not have a technological debt with Jenkins used in all pipelines in the company.

GitLab with its CI is a reasonable solution when we want to have a tool that's easy to use with a lot of advanced features and great integration with all that public cloud provides and also on-premises.

GitHub Actions is the right choice if you don’t have too much time to spend on writing and defining CICD pipelines. You want to have a complex CICD pipeline which can be written in a few moments. Here GitHub Actions appears to be the hero, especially for some open source projects hosted there.

CICD for Modern MLOps platform

Complex Event Processing Platform and MLOps Platform with CICD — CICD tools comaprision

Modern Data Streaming and/or AI platforms contain multiple microservices and delivering new releases in a smooth, efficient way requires using the right CICD tools and well built CICD pipelines. Based on our experience in GetInData projects, we can highly recommend using GitLab CI as the flexible solution for on-premise and SaaS infrastructure.

big data

CEP

stream processing

Gitlab

CI/CD

Last updated: 21 July 2021

Written by

Albert Lewandowski

Big Data DevOps Engineer

Like this post?
Spread the word

Want more? Check our articles

Tutorial

Data Mesh as a proper way to organise data world

Data Mesh as an answer In more complex Data Lakes, I usually meet the following problems in organizations that make data usage very inefficient: Teams…

Success Stories

How we built a Modern Data Platform in 4 months for Volt.io, a FinTech scale-up

Money transfers from one account to another within one second, wherever you are? Volt.io is building the world’s first global real-time payment…

Tutorial

Deploying MLflow on the Google Cloud Platform using App Engine

MLOps platforms delivered by GetInData allow us to pick best of breed technologies to cover crucial functionalities. MLflow is one of the key…

Use-cases/Project

Geospatial analytics on Hadoop

A few months ago I was working on a project with a lot of geospatial data. Data was stored in HDFS, easily accessible through Hive. One of the tasks…

How do we apply knowledge sharing in our teams? GetInData Guilds

Do you remember our blog post about our internal initiatives such as Lunch & Learn and internal training? If yes, that’s great! If you didn’t get the…

Tutorial

Feature Store comparison: 4 Feature Stores - explained and compared

In this blog post, we will simply and clearly demonstrate the difference between 4 popular feature stores: Vertex AI Feature Store, FEAST, AWS…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

Different generations of CICD tools

What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment.

The later, the simpler

What is important in the CICD tool?

How many steps should/can we automate?

KPIs for CICD

Cycle time

Mean Time Between Failures and Mean Time To Recover

Deploy Ready Builds

Infrastructure Costs and Uptime

Comparison between Jenkins, GitLab CI and GitHub Actions

CICD for Modern MLOps platform

Like this post?Spread the word

Want more? Check our articles

Data Mesh as a proper way to organise data world

How we built a Modern Data Platform in 4 months for Volt.io, a FinTech scale-up

Deploying MLflow on the Google Cloud Platform using App Engine

Geospatial analytics on Hadoop

How do we apply knowledge sharing in our teams? GetInData Guilds

Feature Store comparison: 4 Feature Stores - explained and compared

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!