Accelerate Docker builds with cache
Speed and efficiency are paramount during the build process. If you use a Dockerfile to build your container images from source code, you want to know about build cache.
In this blog post, we’ll talk about what happens when you create a Docker image using a Dockerfile, how caching works with Docker, and how to optimize your Dockerfiles to maximize the benefits of build cache with Docker and on Koyeb.
Build from cache
Let's take this Dockerfile example:
FROM python
COPY . /app
RUN pip install /app
RUN python /app/manage.py collectstatic
CMD ["python", "/app/manage.py", "runserver"]
To create an image from this Dockerfile that Docker can run, you would run the command docker build .
in the Dockerfile's directory.
You might notice the command takes a few minutes to finish. However, if you run the command again, the image will be built nearly instantly. This is because Docker has a cache.
What is a cache?
A cache is a temporary storage area for data that is optimized for quick retrievals. Whenever something is expensive to compute (e.g. lengthy), software will often stash the result in a cache. Future requests for that same data can then directly pick up the result from the cache to bypass the expensive computation.
For every line of a Dockerfile, Docker executes the command and stores the result in a file stored on the hard drive of the computer running the build. This file is the cache.
For example, for the FROM python
instruction, Docker first checks if there is already a result of this command stored in the cache. If no previous result is available, it downloads the Python base image from the origin server.
However, if the cache does have a match, it returns the previously download image.
When dependencies change, the cached results and all cache entries for the remaining Dockerfile instructions are invalidated. For example, COPY . /app
copies your local directory into the container /app
folder.
If any of your files changed, the correct results will not be in the cache and Docker needs to execute the command again.
In fact, whenever a result is not in the cache, all the following commands need to be executed again as well. This is because when building Docker images, all the commands depend on the previous ones. When Docker invalidates the cache of a command, all of the following commands' caches are invalidated too. Meaning, they need to be executed again.
So, in this example, if any file of your directory changes, then Docker has to execute COPY . /app
again, as well as all the subsequent commands.
This is why developers often try to optimize Dockerfiles by having commands that are likely to change at the end of the Dockerfile. A longer, but faster, version of the Dockerfile above would be:
FROM python
COPY requirements.txt /app
RUN pip install -r /app/requirements.txt
COPY . /app
RUN python /app/manage.py collectstatic
CMD ["python", "/app/manage.py", "runserver"]
The file requirements.txt
contains the dependencies of your application, which likely don't change very often. Consequently, the COPY requirements.txt /app
instruction will be cached, and so will the RUN pip install ...
command, which can take a long time to execute.
On the other hand, COPY . /app
is less likely to be cached: it contains your application source code that is probably updated frequently. On the bright side, the remaining commands are fast, so as a developer, your experience using this Dockerfile is better.
What happens when you deploy on Koyeb using the Docker builder?
When you deploy code from a GitHub repository using the Docker builder, the Koyeb platform automatically creates a new isolated virtual machine to build your repository. Because the virtual machine is empty, the on-disk cache will always be empty. We do not use Docker directly to build the images, but rather we leverage buildctl, a low-level tool used by Docker.
We configure buildctl to use a Docker registry as a cache. Think about when you run docker build .
on your laptop. In this case, Docker checks the cache on the local drive to see if there's a file already cached containing the output of the command. In our context, buildctl checks if this file is stored on our cache registry.
On the Koyeb platform, the API endpoint PUT /v1/services/{id}
rebuilds your service automatically when you run "git push" on your repository (assuming you have the git integration activated). This API endpoint accepts a skip_cache
parameter that's set to false
by default. This means that, when you update your app, the cache is used by default.
When you update your application from the control panel, the control panel makes a query to this endpoint and the cache is used, as expected. However, when you click on the "redeploy" button, the control panel makes a call to the API endpoint with the flag set to true
so that the cache is not used.
In short, when you push a new commit on the repository or if you modify the service directly from the control panel: use the cache. When you want to redeploy: don't use the cache.
Deploying via a Dockerfile
In this article, we covered how Docker build cache works and how to write your Dockerfiles to fully leverage the advantages of build cache with Docker and on the Koyeb platform.
Dockerfiles are a great option when deploying your applications because of the degree of flexibility and power they provide. These benefits also explain why it was important for us to support deployments via a Dockerfile and to make the experience as seamless as possible.
You can deploy anything via a Dockerfile on high-performance microVMs across the world with Koyeb. To get started, simply add a Dockerfile to your GitHub repository to deploy any runtime, framework, or language.
Sign up today and start deploying for free with $5.50 of credit granted to your account every month. That covers the cost of running two nano instances or one micro instance 24/7.
Not ready to say goodbye? You're welcome to join the friendliest serverless community or tweet us at @gokoyeb. We'd love to hear from you!