What is CICD? It is an acronym for Continuous Integration Continuous Delivery / Deployment.
CICD can be also described as the methodology focused on automating tasks related to running tests, build phases, deploying to development or production and also carrying out some A/B tests in a fully automated fashion. It became a must in the rapidly changing world in which we need to deploy changes frequently. The old-fashioned maintenance means would be tough, especially for a complex data streaming platform, and CICD tools are the only way to make this manageable.
CI regards the automation process for developers while the CD is about delivering new releases to the target environment
Everything started in 2006 when Martin Fowler wrote an article about the need of having Continuous Integration to reduce the number of bugs and detecting them quickly. The first two important players were CruiseControl (released in 2001) and Hudson CI (later Jenkins) released in 2005. 2008 was the year when Hudson became more and more important, in 2010 Sun Microsystems, in which Hudson’s author, Kohsuke Kawaguchi worked, was purchased by Oracle which wanted to commercialize the open source Hudson. Jenkins has been with us in its open source form since 2011. During this time there were a lot of other competitors but we can chose what we felt was best for us.
Currently, we can see a trend causing more and more developers to learn how to deploy their applications, run tests using GitOps and follow the best practices. At the beginning of the CICD store, setting up the environment, writing scripts or deployment files used to be be a bit complex, but now it has become pretty easy and GitLab CI, the latest version of Jenkins or GitHub Action are the best examples of tools used to achieve this. Due to the popularity of DevOps principles (which, in fact, contain a lot of definitions related to Ops phrases), CICD tools must become easier to use and faster in daily use.
Jenkins needs to be installed on the dedicated infrastructure, writing new pipelines in, for example, Groovywhich in fact can be quite complex to manage. . Jenkins is one of the authors used to implement CICD, but it’s been a player since the first generation of CICD tools, talking about the first CICD phase.
The next ones are GitLab CI, Circle CI and Travis CI,which deliver Pipeline as Code. It is required to write some simple scripts but thanks to managed runners, building CICD pipelines doesn’t take too much time and they are great examples of progress in this market.
The last generation is about building pipelines from prepared blocks the same way as in GitHub Actions. Here we have modules managed by the community (that can be also a disadvantage for some developers) which we can easily mix with each other and build a CICD pipeline even faster and in a simpler way. We can predict that the next step is topopularize more drag&drop solutions (with Web UI or in a similar way to the current GitHub Actions).
What is important in the CICD tool?
When talking about a real-time data streaming platform processing millions of events per second, it is a must to have a smoothly working CICD platform.
Managing rules are a must in the case of CICD tools and fortunately all of them support it. Thismeans that we can set up the following rule: when merging from staging to prod, run tests, build app or build model, run integration tests and deploy to production.
For some use cases, it is important to have the chance to install the CICD tool in the “local” infrastructure within on-premise, or in the public cloud. Here Jenkins and GitLab (with its CI) exhibit extensive capabilities. Even during deployment and when adding all components around each service, you can discover the difference between the old Jenkins and much newer, robust GitLab. Here we also need to mention ease of deployment. In the case of GitLab CI and GitHub Actions, we need to create a repo and start writing the CICD pipeline by creating the files (.gitab-ci.yml or YAML within .github/workflows, for example) while in the case of Jenkins we would need to set up the whole platform which would obviously be quite time consuming.
When talking about mixed deployment (cloud repository with own runners used for running the CICD pipeline), we have multiple configurations for GitLab CI (we can use bare-metal server, Virtual Machine, Docker or Kubernetes) and a bit more limited options for GitHub Actions. Why is it important to mention this ? First of all, it may be more cost-effective in comparison to buying more minutes for managed runners. Secondly, it’s more secure to use its own infrastructure to run tests, build and push images with applied network security groups directly to the runners.
The next thing is the size of the Operational team that will handle CICD pipelines. It’s also worth checking how we can integrate the CICD tool into our target environment - tools like Flux CD are useful nowadays when we need to handle managing multiple platforms.
In the case of making complex pipelines, we should also check how we can manage all secrets, variables, and tokens which are required within our pipelines.
How many steps should/can we automate?
When talking about CICD tools, it’s worth mentioning about the CICD pipelines themselves.
The first is about creating separate stages of the pipeline. Imagine we have a Spark application written in Scala, that we want to deploy to our Kubernetes cluster and we have three environments: development, QA and production.
The second one is about automating running unit tests or validation tests (in case of YAML scripts, for example).
The third one is about the building process, where we can store artifacts from the job and how we want to build it (once per environment or once for all environments?).
The fourth is focused on deployment to development, staging, QA and the different environments for testing the code and verifying if it works as expected.
Deployment to the production environment comes as the fifth component and here it’s important to define some A/B testing scenarios, running Flink jobs in the incubation mode (and here verify results once again), performing blue-green deployment and making it fully automated or not. In some cases, it makes sense to only use manual triggers to update the release in the production environment.
The last part is about managing all secrets used within the pipeline.
KPIs for CICD
It’s always good to have a fully observable environment, application and the same rule applies with CICD pipelines. We can create some KPIs for them to monitor their effectiveness.
The best description of this metric is How long does it take to deliver the results of the job? It describes how much time each pipeline stage takes to complete.
Mean Time Between Failures and Mean Time To Recover
This shows how fast we can perform a rollback of deployment in the case of any failure and how robust our solutions are.
Deploy Ready Builds
It describes how many pipelines finish with success - it's a useful metric to verify if all added tests and verification steps work as expected.
Infrastructure Costs and Uptime
The first measurement is about the costs of the underlying infrastructure for the CICD, running tests and verification of whether the deployment was completed successfully. It can also be connected to the uptime of the platform. Based on this, we can create new enhancements to improve the performance, stability and the speed of deployment.
Comparison between Jenkins, GitLab CI and GitHub Actions
As I used these three tools the most, I can present the multiple differences between them to you, along with their limitations and unique advantages that can make one of them the right choice for you.
Jenkins is the oldest one and it’s not worth using if we want to use something modern and we do not have a technological debt with Jenkins used in all pipelines in the company.
GitLab with its CI is a reasonable solution when we want to have a tool that's easy to use with a lot of advanced features and great integration with all that public cloud provides and also on-premises.
GitHub Actions is the right choice if you don’t have too much time to spend on writing and defining CICD pipelines. You want to have a complex CICD pipeline which can be written in a few moments. Here GitHub Actions appears to be the hero, especially for some open source projects hosted there.
CICD for Modern MLOps platform
Modern Data Streaming and/or AI platforms contain multiple microservices and delivering new releases in a smooth, efficient way requires using the right CICD tools and well built CICD pipelines. Based on our experience in GetInData projects, we can highly recommend using GitLab CI as the flexible solution for on-premise and SaaS infrastructure.
21 July 2021
Big Data DevOps Engineer
Like this post? Spread the word
Want more? Check our articles
From spreadsheets to automated data pipelines - and how this can be achieved with support of Google Cloud
CSVs and XLSXs files are one of the most common file formats used in business to store and analyze data. Unfortunately, such an approach is not…