Tutorial

8 min read

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 3

Please dive in the third part of a blog series based on a project delivered for one of our clients. Please click part I, part II to read the previous parts.

PART I

Problem description
General description of the solution
Problem 1: Limited job output size in GitLab
Problem 2: Limited duration of jobs running on shared runners

PART II

Problem 3: Building a container image in the job
Problem 4: The GitLab Registry token expires too quickly
Problem 5: In the paid GitLab.com plan we have a limit on the shared runners used time
Problem 6: User's names with national characters in GitLab

PART III

Problem 7: Passing on artifacts between CI/CD jobs
Problem 8: Starting docker build manually
Problem 9: We cannot rely on the error code returned by Puppet
Summary

Problem 7: Passing on artifacts between CI/CD jobs

In the second phase of the CI/CD process, artifacts with service-executable files are built and saved to the container file system. In the third phase, the job uploads them to the artifact registry.

Each job is launched in a new container. In order for the job commenced in phase three to read these files, you must use the mechanism for saving the artifacts resulting from the job, which is provided by GitLab. Otherwise, the job in phase three won't have access to the artifacts that were created in phase two jobs.

An example job definition in the .gitlab-ci.ymlfile:

build-service1:
  stage: build
  image:
    name: registry.gitlab.com/foobar/deadbeef/docker-base:latest
  # Upload artifacts produced by each job to GitLab server.
  artifacts:
    paths:
      - bin/
    when: on_success
    # Tell GitLab to remove artifacts automatically after 2 weeks.
    expire_in: 2 weeks
  script:
    - make build

The artifactssection indicates in which directory (paths) artifacts are created that should be saved (uploaded to the GitLab server). We use the expire_in keyword to define how long the artifacts are to be stored by GitLab. Setting the lifetime of the artifacts is good practice because it prevents the consumption of too much disk space on the server.

After completing the job, files from the specified directory will be sent to the GitLab server. You will be able to download them using a web browser.

Each job from the next stage, before being executed will download from the GitLab server artifacts produced by the jobs from the previous stage. This happens automatically and you don't need to modify the definition of these jobs in the .gitlab-ci.yml file.

Problem 8: Starting the docker build manually

In order to speed up the process of creating and developing our pipeline, we want to be able to run individual CI/CD jobs manually, on a laptop/computer. Thanks to this, we can check for potential errors faster. Running a job via CI/CD will always take longer than running it manually on a laptop.

GitLab runner sets specific environment variables when it starts the job. One of them, CI_PROJECT_DIR,is used in our scripts. When we run the job manually, this variable will not be set. You have to set its value yourself. In Dockerfile we introduced an additional check that ensures that the necessary variable is set. If it is not set, the corresponding error message is displayed.

Example of Dockerfile:

FROM centos:7

ARG CI_PROJECT_DIR=INVALID-ONE-ONE-ONE
RUN if [ "${CI_PROJECT_DIR}" = "INVALID-ONE-ONE-ONE" ]; then echo "ERROR! Set CI_PROJECT_DIR variable. If you are invoking 'docker build', then use '--build-arg CI_PROJECT_DIR=/SoMe/PaTh'."; exit 1; fi
ENV CI_PROJECT_DIR=${CI_PROJECT_DIR}

RUN bash -xe "${CI_PROJECT_DIR}/foobar/prepare-docker-base-image.sh"

The above Dockerfile is used to build the image of the container in the job in the first stage of the pipeline.

An example of running docker build with setting an environment variable:

doker build -t cdocker-base-local-$(date +%Y%m%d-%H%M%S) --build-arg
CI_PROJECT_DIR=/builds/foobar/something/project1 -f dir1/Dockerfile .

If we forget to set the required variable with --build-arg, we will see the following error message:

% docker build -t cdocker-base-local-$(date +%Y%m%d-%H%M%S) -f Dockerfile .
Sending build context to Docker daemon 90.11 kB
Step 1/5 : FROM centos:7
 ---> 5e35e350aded
Step 2/5 : ARG CI_PROJECT_DIR=INVALID-ONE-ONE-ONE
 ---> Using cache
 ---> 0e858c9a7a5f
Step 3/5 : RUN if [ "${CI_PROJECT_DIR}" = "INVALID-ONE-ONE-ONE" ]; then echo "ERROR! Set CI_PROJECT_DIR variable. If you are invoking 'docker build', then use '--build-arg CI_PROJECT_DIR=/SoMe/PaTh'."; exit 1; fi
 ---> Running in 2ccf334c80b1

ERROR! Set CI_PROJECT_DIR variable. If you are invoking 'docker build', then use '--build-arg CI_PROJECT_DIR=/SoMe/PaTh'.
The command '/bin/sh -c if [ "${CI_PROJECT_DIR}" = "INVALID-ONE-ONE-ONE" ]; then echo "ERROR! Set CI_PROJECT_DIR variable. If you are invoking 'docker build', then use '--build-arg CI_PROJECT_DIR=/SoMe/PaTh'."; exit 1; fi' returned a non-zero code: 1

Problem 9: We cannot rely on the error code returned by Puppet

In CI/CD jobs and scripts we rely on the status code returned by the application to detect the occurrence of an error. The first stage of our pipeline uses, amongst others, Puppet to install the necessary tools. Even if errors appear during puppet apply (e.g. you cannot install the package or download a file from the Internet), the command will still end with a code of 0 indicating success. This is well-known,and developers have their explanations for such behavior of the application. Fortunately, you can use this additional command line option: --detailed-exitcodes.

Example:

puppet apply --detailed-exitcodes --modulepath="${CI_PROJECT_DIR}" -e 
"include foobar::prepare"

This option causes puppet apply to return an exit code calculated in a different way than the default one. Code 0 means that there were no changes or errors. Code 2 indicates that changes have been made. All other codes listed in the documentation mean that some errors have occurred.

This is the code that performs puppet applyand checks its result. It's used in the first job in our pipeline:

set +e
puppet apply --detailed-exitcodes --modulepath="${CI_PROJECT_DIR}" -e "include foobar::prepare"
retcode="$?"
set -e

echo "Puppet finished with return code '${retcode}'."

if [[ "${retcode}" != "0" ]] && [[ "${retcode}" != "2" ]]; then
	echo "Exiting because of Puppet's return code."
	exit 1
fi

Summary

This is the sample .gitlab-ci.yml file:

image: centos:7

stages:
  - prepare
  - build
  - deploy

before_script:
  # Workaround for "Could not set the value of environment variable 'GITLAB_USER_NAME': could not convert string to current locale" problem.
  # https://gitlab.com/gitlab-org/gitlab-foss/issues/38698
  - export GITLAB_USER_NAME=$(echo $GITLAB_USER_LOGIN)

build-docker-base-image:
  stage: prepare
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
  - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/cicd/Dockerfile-base-image --destination ${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG} --build-arg CI_PROJECT_DIR=${CI_PROJECT_DIR}

build-albatross:
  stage: build
  image:
    name: registry.gitlab.com/acmecorp1/foobarproject/docker-base-image:latest
  # Upload artifacts produced by each job to GitLab server.
  artifacts:
    paths:
      - artifacts/
    when: on_success
    # Tell GitLab to remove artifacts automatically after 2 weeks.
    expire_in: 2 weeks
  script:
    - bash -xe ./cicd/build-albatross.sh

build-alien:
  stage: build
  image:
    name: registry.gitlab.com/acmecorp1/foobarproject/docker-base-image:latest
  # Upload artifacts produced by each job to GitLab server.
  artifacts:
    paths:
      - artifacts/
    when: on_success
    # Tell GitLab to remove artifacts automatically after 2 weeks.
    expire_in: 2 weeks
  script:
    - bash -xe ./cicd/build-alien.sh

# [...]

build-waterbird:
  stage: build
  image:
    name: registry.gitlab.com/acmecorp1/foobarproject/docker-base-image:latest
  # Upload artifacts produced by each job to GitLab server.
  artifacts:
    paths:
      - artifacts/
    when: on_success
    # Tell GitLab to remove artifacts automatically after 2 weeks.
    expire_in: 2 weeks
  script:
    - bash -xe ./cicd/build-waterbird.sh

upload-artifacts:
  stage: deploy
  image:
    name: registry.gitlab.com/acmecorp1/foobarproject/docker-base-image:latest
  script:
    - bash -xe ./cicd/upload-artifacts.sh

What has been achieved?

Thanks to the implementation of automated CI/CD based on GitLab and Kubernetes, we were able to accelerate the process of developing, testing and implementing applications by the client. Developers receive feedback much faster (daily, instead of once a week). We have reduced operating costs by using cloud services. The Kubernetes cluster is self-scalable and, if not used, the cost of its operation is small.

Plans for the future

Our client is very satisfied with the work we've done so far. The next stage of the project is automatic end-to-end testing of the application. For this we will use Terraform and Ansible, and of course Google Cloud services.

big data

kubernetes

google cloud platform

cloud

CI/CD

Last updated: 17 August 2020

Written by

Maciej Korzeń

DevOps

Like this post?
Spread the word

Want more? Check our articles

getindata amundsen feast machine learining notext

Tutorial

Machine Learning Features discovery with Feast and Amundsen

One of the main challenges of today's Machine Learning initiatives is the need for a centralized store of high-quality data that can be reused by Data…

getindata big data blog apache spark iceberg

Tutorial

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

SQL language was invented in 1970 and has powered databases for decades. It allows you not only to query the data, but also to modify it easily on the…

Big Data Event

Five big ideas to learn at Big Data Tech Warsaw 2020

Hello again in 2020. It’s a new year and the new, 6th edition of Big Data Tech Warsaw is coming soon! Save the date: 27th of February. We have put…

Tutorial

Your ML prototype doesn't have to be messy. A few words about the GetInData Machine Learning Framework

A prototype is an early sample, model, or release of a product built to test a concept or process. What we have above is a nice, generic definition of…

power of big data ii obszar roboczy 1 3x 100

Tutorial

Power of Big Data: Healthcare

Welcome to another Power of Big Data series post. In the series, we present the possibilities offered by solutions related to the management, analysis…

getindator create an image illustrating the concept of data ske b0d7e21f 9c85 40d2 9a52 32caba3aece3

Tutorial

Data skew in Flink SQL

Data processing in real-time has become crucial for businesses, and Apache Flink, with its powerful stream processing capabilities, is at the…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 3

Table of Contents

PART I

PART II

PART III

Problem 7: Passing on artifacts between CI/CD jobs

Problem 8: Starting the docker build manually

Problem 9: We cannot rely on the error code returned by Puppet

Summary

What has been achieved?

Plans for the future

Like this post?Spread the word

Want more? Check our articles

Machine Learning Features discovery with Feast and Amundsen

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

Five big ideas to learn at Big Data Tech Warsaw 2020

Your ML prototype doesn't have to be messy. A few words about the GetInData Machine Learning Framework

Power of Big Data: Healthcare

Data skew in Flink SQL

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!