Optimising Dockerfiles

Posted on October 27, 2023 • 8 minutes • 1619 words

Table of contents

Holistically thinking
Optimising Dockerfiles
Wrap Up

In this article we’ll explore several techniques on optimising Dockerfiles to reduce the size of the final image and increase build efficiency.

First, if you’re interested in learning more about the relationship between Dockerfiles and Docker images, take a look at my previous article on the subject.

Holistically thinking

First of all, it’s important to consider that every layer in a Dockerfile adds to the size of the final image. A layer is any line which starts with Docker syntax, such as RUN or COPY. Fortunately, there’s a built-in docker command to help us understand where the data is; docker history.

For instance:

1$ docker history test-image | head -n 15

Which results in something like the following (Note, you may need to scroll to see the SIZE column on this page):

 1IMAGE          CREATED        CREATED BY                                      SIZE      COMMENT
 2132df553de45   2 months ago   CMD ["/ops/bin/uvicorn" "ma…                    0B        buildkit.dockerfile.v0
 3<missing>      2 months ago   COPY main.py logging.yaml ./ # buildkit         982B      buildkit.dockerfile.v0
 4<missing>      2 months ago   RUN /bin/sh -c poetry run pip3 install -r $A…   480MB     buildkit.dockerfile.v0
 5<missing>      2 months ago   COPY /atcloud/artifacts/* /usr/local/applica…   24.2MB    buildkit.dockerfile.v0
 6<missing>      2 months ago   ENV AT_ML_MODEL_PATH=/ops/a…                    0B        buildkit.dockerfile.v0
 7<missing>      2 months ago   RUN /bin/sh -c poetry install --no-interacti…   978MB     buildkit.dockerfile.v0
 8<missing>      2 months ago   COPY pyproject.toml poetry.lock snyk-ignore.…   199kB     buildkit.dockerfile.v0
 9<missing>      2 months ago   ENTRYPOINT ["/ops/bin/pytho…                    0B        buildkit.dockerfile.v0
10<missing>      2 months ago   WORKDIR /ops/app                                0B        buildkit.dockerfile.v0
11<missing>      2 months ago   RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION…   8.63MB    buildkit.dockerfile.v0
12<missing>      2 months ago   RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION…   13.5MB    buildkit.dockerfile.v0
13<missing>      2 months ago   COPY ./scripts /ops/scripts…                    636B      buildkit.dockerfile.v0
14<missing>      2 months ago   RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION…   82MB      buildkit.dockerfile.v0
15<missing>      2 months ago   COPY install-build-tools.sh /usr/local/appli…   220B      buildkit.dockerfile.v0

It’s important to understand that your image will be the sum of these layers, not just the size of the final layer. So how can we reduce the sizes of our images?

Optimising Dockerfiles

There’s a number of things we can do to make our Dockerfile and the image that it produces as optimal as possible, not just in size, but efficiency too. The following tips will help you can achieve the most optimal Dockerfile possible.

Use a Supported Minimal/Slim Base Image

Start with a well trusted and up-to-date base image, such as Rocky Linux ‘minimal’ (around 130mb), Debian ‘slim’ (around 100mb), or for one of the smallest around - Alpine (~8mb!). There’s a trade-off when selecting a base image here concerning the amount of dependencies you’ll require. Sometimes it’ll be more work maintaining those dependencies manually than it’s worth, so choose wisely. I’d suggest experimenting with the a tiny image like Alpine and see how you get on, especially if you’re working on something that you know to have a small number of dependencies. An added bonus to using these slender images is their reduced attack surface, increasing security somewhat. Less stuff installed is less stuft to attack.

Combining Commands

Minimise the number of layers by combining multiple commands into a single RUN command. This reduces the number of intermediate layers created during the build process. This also allows you to remove any junk which comes along with commands by adding a clean-up operation at the end. Remember, once the layer is successfully created, the filesystem in that layer becomes immutable, along with any files it created along the way.

Good example:

1RUN dnf update && \
2    dnf install -y -q package1 package2 && \
3    dnf clean all

Notice the use of the logical AND operator &&. Using this technique, we’re able to string together multiple commands into one Docker command! The \ character is a method to allow us to use multiple lines and just allows us to keep Dockerfiles easily readable.

Bad example: This will result in a larger image than necessary:

1RUN dnf update && \
2    dnf install -y -q package1 package2 
3RUN dnf clean all

In this bad example, the dnf clean all command runs after the previous layer became immutable. The files are already written at this point, and size savings are no longer possible for that layer.

Make good use of layer caching

Intelligent use of layering isn’t just to have a smaller images. It’s necessary to holistically consider what happens when your application builds too. Using nodejs as an example, if you rm -rf node_modules in your Dockerfile, this means that whilst your Docker image will indeed be smaller, the next time your CI/CD pipeline builds, it’s going to have to run npm install again. The pattern for build dependencies should be to install dependencies that don’t change frequently as one layer, and then copy code as another.

Again taking nodejs as an example:

Good: This will cache the npm install unless your package json ever changes. Whilst your build image will be slightly larger as a result, the trade off is a cached npm install, which is worth it.

1COPY .npmrc package.json $APP_DIR/
2RUN npm install
3
4COPY . $APP_DIR/
5RUN npm run build

Bad: This will cause an npm install every time any code in your app changes:

1COPY . $APP_DIR/
2RUN npm install && \
3  npm run build && \
4  rm -rf node_modules

Use Multi-Stage Builds

If your application requires extra build tools, libraries, or dependencies that are not needed in the final image, use multi-stage builds to create a smaller final image. This involves building in one container and copying only the necessary artefacts to the final container.

 1# Build stage
 2FROM alpine:3.14 AS build
 3WORKDIR /usr/local/bin/build
 4COPY package*.json ./
 5RUN npm install
 6COPY . .
 7RUN npm run build
 8
 9# Final stage
10FROM alpine:3.14
11COPY --from=build /some/path/build-output /usr/local/bin/app

Multistage builds in Docker offer several advantages, such as:

Smaller and more lightweight final images
Reduced attack surface and security vulnerabilities of your production container
Build tool isolation prevents build processes affecting the final runtime environment
Minimal production runtime images
The Docker build cache is used more effectively (Only the stages that have changes since the last build will be rebuilt)

Clean Up Unnecessary Files

Remove unnecessary or temporary files, such as package caches, build artefacts, and log files, in the same RUN command to keep the layer size small. If you delete these temporary files in a different RUN command, Docker will still keep them in an intermediate layer which bloats the size of the image.

Good:

1RUN npm run lint:styles && \
2    npm run lint:all && \
3    npm run test:all && \
4    npm run pact:all && \
5    rm -rf .angular && \
6    rm -rf node_modules/.cache && \
7    rm -rf ~/.npm

Bad:

1RUN npm run lint:styles && \
2    npm run lint:all && \
3    npm run test:all && \
4    npm run pact:all 
5
6RUN rm -rf .angular && \
7    rm -rf node_modules/.cache && \
8    rm -rf ~/.npm

To reiterate the point again, the layer creating the .angular node_modules/.cache ~/.npm files and directories is immutible by now, and this clean-up will have no effect on the image layer or final image size.

Use .dockerignore

Create a .dockerignore file to exclude unnecessary files and directories from being added to the image. This prevents adding large files, build artefacts, and unneeded data to the image.

Example .dockerignore file:

1.git
2node_modules
3.vscode
4*.log

Utilise Layer Caching by Ordering Commands Logically

Arrange your Dockerfile commands/layers to ensure that frequently changing parts of your application are placed towards the end. This way, changes in these layers don’t trigger a rebuild of the entire image. Instead, they use cached layers from previous builds.

Use COPY/ADD Wisely

Use caution when using COPY or ADD commands. Include only necessary files and directories, as this can quickly increase the layer/image size. It’s easy to fall into the trap of running a COPY . . to get all files into the current working directory. However, it’s well worth figuring out the exact files required and copying only those, to keep layer and image sizes optimum.

Layer Squashing

Docker layer squashing refers to the process of reducing the number of layers in a Docker image. Combining multiple layers into a single layer may help conserve space and offer better performance during operations such as pulling, pushing and running containers.

In this article and the last, we’ve already covered when building a Docker image, each command in the Dockerfile creates a new layer. While this layering system has advantages, such as caching and reusability, it can also result in larger image sizes due to the accumulation of those layers. You’ll notice that when you pull a well-known base image such as Rockylinux or Debian, there only appears to be one layer. This is because the image has been squashed.

Various tools and techniques can be used to squash Docker layers. You’re can pass the argument of --squash to the Docker build command in order to produce a squashed image. You can read more about that in the official documentation .

Additionally, some build tools or CI/CD pipelines have built-in features or options for automatic layer squashing. There’s also, external tools like docker-squash that can be used to manually squash layers.

It’s worth knowing however that squashing does come at a cost. You lose the layer sharing ability of Docker which can actually increase build times of multiple images.

Wrap Up

By following these best practices, you can significantly reduce the size of your Docker image while still maintaining all the required dependencies and components for your application to run properly and successfully. We’ve covered lots in this article, so hopefully one or two of these methods can help. Please let me know in the comments if you’ve discovered any other techniques of your own!

📷 Cover photo by Bernd 📷 Dittrich on Unsplash under the Unsplash license.