Optimising Dockerfiles
Posted on October 27, 2023 • 8 minutes • 1619 words
Table of contents
In this article we’ll explore several techniques on optimising Dockerfiles to reduce the size of the final image and increase build efficiency.
First, if you’re interested in learning more about the relationship between Dockerfiles and Docker images, take a look at my previous article on the subject.
Holistically thinking
First of all, it’s important to consider that every layer in a Dockerfile adds to the size of the final image. A layer is any line which starts with Docker syntax, such as RUN
or COPY
. Fortunately, there’s a built-in docker command to help us understand where the data is; docker history
.
For instance:
1$ docker history test-image | head -n 15
Which results in something like the following (Note, you may need to scroll to see the SIZE
column on this page):
1IMAGE CREATED CREATED BY SIZE COMMENT
2132df553de45 2 months ago CMD ["/ops/bin/uvicorn" "ma… 0B buildkit.dockerfile.v0
3<missing> 2 months ago COPY main.py logging.yaml ./ # buildkit 982B buildkit.dockerfile.v0
4<missing> 2 months ago RUN /bin/sh -c poetry run pip3 install -r $A… 480MB buildkit.dockerfile.v0
5<missing> 2 months ago COPY /atcloud/artifacts/* /usr/local/applica… 24.2MB buildkit.dockerfile.v0
6<missing> 2 months ago ENV AT_ML_MODEL_PATH=/ops/a… 0B buildkit.dockerfile.v0
7<missing> 2 months ago RUN /bin/sh -c poetry install --no-interacti… 978MB buildkit.dockerfile.v0
8<missing> 2 months ago COPY pyproject.toml poetry.lock snyk-ignore.… 199kB buildkit.dockerfile.v0
9<missing> 2 months ago ENTRYPOINT ["/ops/bin/pytho… 0B buildkit.dockerfile.v0
10<missing> 2 months ago WORKDIR /ops/app 0B buildkit.dockerfile.v0
11<missing> 2 months ago RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION… 8.63MB buildkit.dockerfile.v0
12<missing> 2 months ago RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION… 13.5MB buildkit.dockerfile.v0
13<missing> 2 months ago COPY ./scripts /ops/scripts… 636B buildkit.dockerfile.v0
14<missing> 2 months ago RUN |4 PYTHON_VERSION_MINOR=8 PYTHON_VERSION… 82MB buildkit.dockerfile.v0
15<missing> 2 months ago COPY install-build-tools.sh /usr/local/appli… 220B buildkit.dockerfile.v0
It’s important to understand that your image will be the sum of these layers, not just the size of the final layer. So how can we reduce the sizes of our images?
Optimising Dockerfiles
There’s a number of things we can do to make our Dockerfile and the image that it produces as optimal as possible, not just in size, but efficiency too. The following tips will help you can achieve the most optimal Dockerfile possible.
Use a Supported Minimal/Slim Base Image
Start with a well trusted and up-to-date base image, such as Rocky Linux ‘minimal’ (around 130mb), Debian ‘slim’ (around 100mb), or for one of the smallest around - Alpine (~8mb!). There’s a trade-off when selecting a base image here concerning the amount of dependencies you’ll require. Sometimes it’ll be more work maintaining those dependencies manually than it’s worth, so choose wisely. I’d suggest experimenting with the a tiny image like Alpine and see how you get on, especially if you’re working on something that you know to have a small number of dependencies. An added bonus to using these slender images is their reduced attack surface, increasing security somewhat. Less stuff installed is less stuft to attack.
Combining Commands
Minimise the number of layers by combining multiple commands into a single RUN
command. This reduces the number of intermediate layers created during the build process. This also allows you to remove any junk which comes along with commands by adding a clean-up operation at the end. Remember, once the layer is successfully created, the filesystem in that layer becomes immutable, along with any files it created along the way.
Good example:
1RUN dnf update && \
2 dnf install -y -q package1 package2 && \
3 dnf clean all
Notice the use of the logical AND operator &&
. Using this technique, we’re able to string together multiple commands into one Docker command! The \
character is a method to allow us to use multiple lines and just allows us to keep Dockerfiles easily readable.
Bad example: This will result in a larger image than necessary:
1RUN dnf update && \
2 dnf install -y -q package1 package2
3RUN dnf clean all
In this bad example, the dnf clean all
command runs after the previous layer became immutable. The files are already written at this point, and size savings are no longer possible for that layer.
Make good use of layer caching
Intelligent use of layering isn’t just to have a smaller images. It’s necessary to holistically consider what happens when your application builds too. Using nodejs as an example, if you rm -rf node_modules
in your Dockerfile, this means that whilst your Docker image will indeed be smaller, the next time your CI/CD pipeline builds, it’s going to have to run npm install
again. The pattern for build dependencies should be to install dependencies that don’t change frequently as one layer, and then copy code as another.
Again taking nodejs as an example:
Good:
This will cache the npm install unless your package json ever changes. Whilst your build image will be slightly larger as a result, the trade off is a cached npm install, which is worth it.
1COPY .npmrc package.json $APP_DIR/
2RUN npm install
3
4COPY . $APP_DIR/
5RUN npm run build
Bad:
This will cause an npm install every time any code in your app changes:
1COPY . $APP_DIR/
2RUN npm install && \
3 npm run build && \
4 rm -rf node_modules
Use Multi-Stage Builds
If your application requires extra build tools, libraries, or dependencies that are not needed in the final image, use multi-stage builds to create a smaller final image. This involves building in one container and copying only the necessary artefacts to the final container.
1# Build stage
2FROM alpine:3.14 AS build
3WORKDIR /usr/local/bin/build
4COPY package*.json ./
5RUN npm install
6COPY . .
7RUN npm run build
8
9# Final stage
10FROM alpine:3.14
11COPY --from=build /some/path/build-output /usr/local/bin/app
Multistage builds in Docker offer several advantages, such as:
- Smaller and more lightweight final images
- Reduced attack surface and security vulnerabilities of your production container
- Build tool isolation prevents build processes affecting the final runtime environment
- Minimal production runtime images
- The Docker build cache is used more effectively (Only the stages that have changes since the last build will be rebuilt)
Clean Up Unnecessary Files
Remove unnecessary or temporary files, such as package caches, build artefacts, and log files, in the same RUN
command to keep the layer size small. If you delete these temporary files in a different RUN
command, Docker will still keep them in an intermediate layer which bloats the size of the image.
Good:
1RUN npm run lint:styles && \
2 npm run lint:all && \
3 npm run test:all && \
4 npm run pact:all && \
5 rm -rf .angular && \
6 rm -rf node_modules/.cache && \
7 rm -rf ~/.npm
Bad:
1RUN npm run lint:styles && \
2 npm run lint:all && \
3 npm run test:all && \
4 npm run pact:all
5
6RUN rm -rf .angular && \
7 rm -rf node_modules/.cache && \
8 rm -rf ~/.npm
To reiterate the point again, the layer creating the .angular
node_modules/.cache
~/.npm
files and directories is immutible by now, and this clean-up will have no effect on the image layer or final image size.
Use .dockerignore
Create a .dockerignore file to exclude unnecessary files and directories from being added to the image. This prevents adding large files, build artefacts, and unneeded data to the image.
Example .dockerignore file:
1.git
2node_modules
3.vscode
4*.log
Utilise Layer Caching by Ordering Commands Logically
Arrange your Dockerfile commands/layers to ensure that frequently changing parts of your application are placed towards the end. This way, changes in these layers don’t trigger a rebuild of the entire image. Instead, they use cached layers from previous builds.
Use COPY/ADD Wisely
Use caution when using COPY
or ADD
commands. Include only necessary files and directories, as this can quickly increase the layer/image size. It’s easy to fall into the trap of running a COPY . .
to get all files into the current working directory. However, it’s well worth figuring out the exact files required and copying only those, to keep layer and image sizes optimum.
Layer Squashing
Docker layer squashing refers to the process of reducing the number of layers in a Docker image. Combining multiple layers into a single layer may help conserve space and offer better performance during operations such as pulling, pushing and running containers.
In this article and the last, we’ve already covered when building a Docker image, each command in the Dockerfile creates a new layer. While this layering system has advantages, such as caching and reusability, it can also result in larger image sizes due to the accumulation of those layers. You’ll notice that when you pull a well-known base image such as Rockylinux or Debian, there only appears to be one layer. This is because the image has been squashed.
Various tools and techniques can be used to squash Docker layers. You’re can pass the argument of --squash
to the Docker build command in order to produce a squashed image. You can read more about that in the official documentation
.
Additionally, some build tools or CI/CD pipelines have built-in features or options for automatic layer squashing. There’s also, external tools like docker-squash that can be used to manually squash layers.
It’s worth knowing however that squashing does come at a cost. You lose the layer sharing ability of Docker which can actually increase build times of multiple images.
Wrap Up
By following these best practices, you can significantly reduce the size of your Docker image while still maintaining all the required dependencies and components for your application to run properly and successfully. We’ve covered lots in this article, so hopefully one or two of these methods can help. Please let me know in the comments if you’ve discovered any other techniques of your own!