4 min read

Docker & CircleCI: Speed up your builds and avoid paying for Docker Layer Caching

Docker & CircleCI: Speed up your builds and avoid paying for Docker Layer Caching
Photo by Fabian Blank / Unsplash

This article explains how to benefit from fast Docker builds on CircleCI without breaking the bank.

I have been using CircleCI for years, and it is my favourite CI/CD tool. However, it sometimes rebuilds Docker images from scratch for no apparent reason. Over the years, I can across a few tips to speed up builds, use the cache effectively, and reduce costs. Most of these tips are probably applicable to other CI/CD providers.

What is the Docker cache?

Docker provides the commands docker build and docker push to create and save Docker images.

Over time, projects grow and builds become slower (dependencies take longer to install, code takes longer to build, ...). Luckily, Docker has a built-in cache system to keep them fast: if you build the same project twice, docker will only re-run the commands that have changed or that use files that have changed.

Here is an example of a Dockerfile that takes advantage of the cache:

FROM python:3.9.13

WORKDIR /srv

# First, only add the file containing the dependencies
ADD requirements.txt .

# Then, install the dependencies. This command will be re-run
# only if the content of requirements.txt has changed.
RUN pip install -r requirements.txt

# Add the rest of your code
ADD . .

If you build the project twice, you will see the following output the second time:

 => CACHED [3/6] ADD requirements.txt .
 => CACHED [4/6] RUN pip install -r requirements.txt

The dependencies haven't changed. So, as expected, the command is not re-run.

How much does it cost to use the Docker cache on CircleCI?

CircleCI offers a feature called the Docker Layer Caching (DLC). It is described as follows in the documentation:

DLC caches the individual layers of any Docker images built during your CircleCI jobs, and then reuses unchanged image layers on subsequent CircleCI runs, rather than rebuilding the entire image every time. In short, the less your Dockerfiles change from commit to commit, the faster your image-building steps will run.

That seems close to what we would expect the Docker cache to do, right? So will this save you time and money? Let's look at the numbers.

DLC costs "200 credits per job run in a pipeline (equivalent to $0.12/job run)". If, for example, you push new code to your repository 10 times a day, it will cost you $0.12/job*10 jobs/day*30 days/month = $36/month. Ouch!

So, how can we do better? How can we get fast builds without spending money on Docker Layer Caching?

Tip #1: Use the same cache for Docker and Docker Compose

If you build a project on CircleCI using docker build then docker-compose build, you will notice that the Docker cache is not used by the second command:

$ docker build -t test . \
    && docker-compose build

# First time
Step 4/6 : RUN pip install -r requirements.txt
 ---> Running in 7e0474254a8f

# Second time
Step 4/6 : RUN pip install -r requirements.txt
 ---> Running in 13791a2438fe

This can be resolved by setting COMPOSE_DOCKER_CLI_BUILD=true :

$ export COMPOSE_DOCKER_CLI_BUILD=true \
    && docker build -t test . \
    && docker-compose build

# First time
Step 4/6 : RUN pip install -r requirements.txt
 ---> Running in 8eefae853814

# Second time
Step 3/6 : ADD requirements.txt .
 ---> Using cache
 ---> 01738077c4dd

Tip #2: Pull before building

You should also pull the previous version of your image from the registry before running your build command.

export COMMIT_TAG=${CIRCLE_SHA1}
export BRANCH_TAG=${CIRCLE_BRANCH}

docker pull ${IMAGE_NAME}:${COMMIT_TAG} \
    || docker pull ${IMAGE_NAME}:${BRANCH_TAG} \
    || docker pull ${IMAGE_NAME}:main \
    || true

This will try to pull the Docker image of the current commit. If it does not exist, it will pull the Docker image of the current branch. If it does not exist, it will pull the Docker image of the main branch. And if nothing works, it will do nothing and let the build start from scratch.

Later in your build script, you should push the images using the same tags:

docker build -t ${IMAGE_NAME}:${COMMIT_TAG}
docker tag ${IMAGE_NAME}:${COMMIT_TAG} ${IMAGE_NAME}:${BRANCH_TAG}

docker push ${IMAGE_NAME}:${COMMIT_TAG}
docker push ${IMAGE_NAME}:${BRANCH_TAG}

Note: When deploying your code, use the image tagged with ${CIRCLE_SHA1} rather than ${CIRCLE_BRANCH} . Deploying mutable images is a bad practice that will lead to issues in production.

Tip #3: Specify --cache-from

Sometimes, CircleCI does not use the image you just pulled. As far as I can tell, it occurs using ADD in your Dockerfile. The easiest way to fix it is to set the --cache-from parameter when running the docker build command:

# Get the --cache-from parameter
DOCKER_BUILD_EXTRA_PARAMS=$(docker images \
  | grep ${IMAGE_NAME} \
  | head -n 1 \
  | awk '{ print "--cache-from=" $1 ":" $2 }' || true)

# Use it
docker build ${DOCKER_BUILD_EXTRA_PARAMS} [...]

Tip #4: Skip unnecessary builds without pulling

If you have one Dockerfile per repository, you will almost always want to re-run docker build.

But if you have a monorepo containing many Dockerfile, how many will you work on at a time? Probably few. Couldn't you save even more build time by avoiding rebuilding the images that are unrelated to your changes?

It can be done by tagging your Docker images with the hash of the docker context used to build them:

  1. Hash the context: By default, docker build uses the directory containing the Dockerfile as context. By hashing it, you can create an identifier that uniquely represents the result of the build (without needing to run the build...).
  2. Tag and push: At the end of your build process, tag and push your Docker image using that identifier.
  3. Conditional pull: Check if an image with that identifier already exists in your registry before pulling it. Its existence means you already ran a build with the exact same context and Dockerfile. Therefore, you can skip the build and maybe even your tests.
# 1. Create an ID
export CONTENT_HASH=$(\
    find ${CONTEXT_FOLDER} \( -type l -o -type f \) -print0 \
    | sort -z \
    | xargs -0 sha1sum \
    | sha1sum \
    | awk '{ print $1 }' \
)

# 2. Conditional pull
if ! docker manifest inspect ${IMAGE_NAME}:${CONTENT_HASH}
then
    # 3. Build and push using the commands described above
fi

# 4. Tag the images directly in the registry
docker buildx imagetools create ${IMAGE_NAME}:${CONTENT_HASH} \
--tag ${IMAGE_NAME}:${COMMIT_TAG}
docker buildx imagetools create ${IMAGE_NAME}:${CONTENT_HASH} \
--tag ${IMAGE_NAME}:${BRANCH_TAG}

Conclusion

CircleCI is an amazing CI/CD tool. I have been using it for years and will continue using it for future projects. It is ahead of its competition in many ways.

However, I am not a big fan of Docker Layer Caching. I am sure some people faced issues that DLC solved elegantly. In my case, I prefer to write a few extra lines of bash to get faster and cheaper builds.

Bonus - I uploaded my build script to Github in case you want to use it...