Tutorial
9 min read

Different generations of CICD tools

What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment.

CICD can be also described as the methodology focused on automating tasks related to running tests, build phases, deploying to development or production and also carrying out some A/B tests in a fully automated fashion. It became a must in the rapidly changing world in which we need to deploy changes frequently. The old-fashioned maintenance means would be tough, especially for a complex data streaming platform, and CICD tools are the only way to make this manageable. 

CI regards  the automation process for developers while the CD is about delivering new releases to the target environment

Complex Event Processing with CI/CD
Continuous Integration, Delivery and Deployment

Source

Everything started in 2006 when Martin Fowler wrote an article about the need of having Continuous Integration to reduce the number of bugs and detecting them quickly. The first two important players were CruiseControl (released in 2001) and Hudson CI (later Jenkins) released in 2005. 2008 was the year when Hudson became more and more important, in 2010 Sun Microsystems, in which Hudson’s author, Kohsuke Kawaguchi worked, was purchased by Oracle which wanted to commercialize the open source Hudson. Jenkins has been with us in its open source form since 2011. During this time there were a lot of other competitors but we can chose what we felt was best for us. 

The later, the simpler

CI/CD tools comparision
CI/CD waves

Source

Currently, we can see a trend causing more and more developers to learn how to deploy their applications, run tests using GitOps  and follow the best practices. At the beginning of the CICD store, setting up the environment, writing scripts or deployment files used to be be a bit complex, but now it has become pretty easy and GitLab CI, the latest version of Jenkins or GitHub Action are the best examples of tools used to achieve this. Due to the popularity of DevOps principles (which, in fact, contain a lot of definitions related to Ops phrases), CICD tools must become easier to use  and faster in daily use.

Jenkins needs to be installed on the dedicated infrastructure, writing new pipelines in, for example, Groovywhich in fact can be quite complex to manage. . Jenkins is one of the authors used to  implement CICD, but it’s been a player since the first generation of CICD tools, talking about the  first CICD phase.

The next ones are GitLab CI, Circle CI and Travis CI,which deliver Pipeline as Code. It is required to write some simple scripts but thanks to managed runners, building CICD pipelines doesn’t take too much time and they are great examples of progress in this market.

The last generation is about building pipelines from prepared blocks the same way as  in GitHub Actions. Here we have modules managed by the community (that can be also a disadvantage for some developers) which we can easily mix with each other and build a CICD pipeline even faster and in a simpler way. We can predict that the next step is topopularize more drag&drop solutions (with Web UI or in a similar way to the current GitHub Actions).

What is important in the CICD tool?

When talking  about a real-time data streaming platform processing millions of events per second, it is a must to have a smoothly working CICD platform. 

Managing rules are a must in the case of CICD tools and fortunately all of them support it. Thismeans that we can set up the following rule: when merging from staging to prod, run tests, build app or build model, run integration tests and deploy to production. 

For some use cases, it is important to have the chance to install the CICD tool in the “local” infrastructure within on-premise, or in the public cloud. Here Jenkins and GitLab (with its CI) exhibit extensive capabilities. Even during deployment and when adding all components around each service, you can discover the difference between the old Jenkins and much newer, robust GitLab. Here we also need to mention  ease of deployment. In the case of GitLab CI and GitHub Actions, we need to create a repo and start writing the CICD pipeline by creating the files (.gitab-ci.yml or YAML within .github/workflows, for example) while in the case of Jenkins we would need to set up the whole platform which would obviously be quite time consuming.

When talking about mixed deployment (cloud repository with own runners used for running the CICD pipeline), we have multiple configurations for GitLab CI (we can use bare-metal server, Virtual Machine, Docker or Kubernetes) and a bit more limited options for GitHub Actions. Why is it important to mention this ? First of all, it may be more cost-effective  in comparison to buying more minutes for managed runners. Secondly, it’s more secure to use its own infrastructure to run tests, build and push images with applied network security groups directly to the runners.

The next thing is the size of the Operational team that will handle CICD pipelines. It’s also worth checking how we can integrate the CICD tool into our target environment - tools like Flux CD are useful nowadays when we need to handle managing multiple platforms.

In the case of making complex pipelines, we should also check how we can manage all secrets, variables, and tokens which are required within our pipelines.

How many steps should/can we automate?

When talking about CICD tools, it’s worth mentioning about the CICD pipelines themselves.

The first is about creating separate stages of the pipeline. Imagine we have a Spark application written in Scala, that we want to deploy to our Kubernetes cluster and we have three environments: development, QA and production.

The second one is about automating running unit tests or validation tests (in case of YAML scripts, for example).

The third one is about the building process, where we can store artifacts from the job and how we want to build it (once per environment or once for all environments?).

The fourth is focused on deployment  to development, staging, QA and the different environments for testing the code and verifying if it works as expected.

Deployment to the production environment comes as the fifth component and here it’s important to define some A/B testing scenarios, running Flink jobs in the incubation mode (and here verify results once again), performing  blue-green deployment and making it fully automated or not. In some cases, it makes sense to only use manual triggers to update the release in the production environment.

The last part is about managing all secrets used within the pipeline.

KPIs for CICD

  • It’s always good to have a fully observable environment, application and the same rule applies with CICD pipelines. We can create some KPIs for them to monitor their effectiveness.

Cycle time

  • The best description of this metric is How long does it take to deliver the results of the job? It describes how much time each pipeline stage takes to complete.

Mean Time Between Failures and Mean Time To Recover

  • This shows how fast we can perform a rollback of deployment in the case of any failure and how robust our solutions are.

Deploy Ready Builds

  • It describes how many pipelines finish with success - it's a useful metric to verify if all added tests and verification steps work as expected.

Infrastructure Costs and Uptime

  • The first measurement is about the costs of the underlying infrastructure for the CICD, running tests and verification of whether the deployment was completed successfully. It can also be connected to the uptime of the platform. Based on this, we can create new enhancements to improve the performance, stability and the speed of deployment.

Comparison between Jenkins, GitLab CI and GitHub Actions

  • As I used these three tools the most, I can present the multiple differences between them to you, along with their limitations and unique advantages that can make one of them the right choice for you.

Jenkins is the oldest one and it’s not worth using if we want to use something modern and we do not have a technological debt with Jenkins used in all pipelines in the company.

GitLab with its CI is a reasonable solution when we want to have a tool that's easy to use with a lot of advanced features and great integration with all that public cloud provides and also on-premises.

GitHub Actions is the right choice if you don’t have too much time to spend on  writing and defining CICD pipelines. You want to have a complex CICD pipeline which can be written in a few moments. Here GitHub Actions appears to be the  hero, especially for some open source projects hosted there.

CICD for Modern MLOps platform

Complex Event Processing Platform and MLOps Platform with CICD
CICD tools comaprision

Modern Data Streaming and/or AI platforms contain multiple microservices and delivering new releases in a smooth, efficient  way requires using the right CICD tools and well built CICD pipelines. Based on our experience in GetInData projects, we can highly recommend using GitLab CI as the flexible solution for on-premise and SaaS infrastructure.

big data
CEP
stream processing
Gitlab
CI/CD
21 July 2021

Want more? Check our articles

big data technology warsaw summit 2021
Big Data Event

COVID-19 changes Big Data Tech Warsaw 2021 but makes it greater at the same time.

Happy New Year 2021! Exactly a year ago nobody could expect how bad for our health, society, and economy the year 2020 will be. COVID-19 infected all…

Read more
highly available airflow cluster aws notext
Tutorial

Highly available Airflow cluster in Amazon AWS

These days, companies getting into Big Data are granted to compose their set of technologies from a huge variety of available solutions. Even though…

Read more
getindata nifi blog post
Tutorial

NiFi Ingestion Blog Series. PART III - No coding, just drag and drop what you need, but if it’s not there… - custom processors, scripts, external services

Apache NiFI, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more
getindata flink kafka audio spectrum analyzer smalltext
Use-cases/Project

Puzzles in the time of plague: truly over-engineered audio spectrum analyzer

Quarantaine project Staying at home is not my particular strong point. But tough times have arrived and everybody needs to change their habits and re…

Read more
1YkseCzHNQ9Sxsi4BHnoCOQ
Use-cases/Project

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…

Read more
acast anomali detection
Use-cases/Project

Anomaly detection implemented in podcasting company

Being a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

By submitting this form, you agree to our  Terms & Conditions