This article is the 19th day article of DMM Group Advent Calendar 2020.
We've made three improvements to reduce the lead time for building Docker, so here are some tips.
This time, I am using ubuntu-18.04 on GitHub Actions, and the version of the tool is summarized below. c.f. https://github.com/actions/virtual-environments
Use jq locally for parsing.
Use BuildKit to measure the build time. BuildKit is a tool built into Docker 18.09 or later. You can resolve dependencies in parallel and import / export the cache.
c.f. https://github.com/moby/buildkit
When building with BuildKit, the time taken to build is displayed.
To use it, set DOCKER_BUILDKIT = 1
in the environment variable.
DOCKER_BUILDKIT=1 docker build -t test .
There is a tool called wagoodman/dive that analyzes Docker images. You can use this to check the size of layers and files.
dive supports json format output. You can use jq as shown below to sort and display the size of layers and files in descending order.
Save the result of dive to a file
dive <IMAGE:TAG> --json <FILENAME>.json
Display 10 large layers in descending order
cat <FILENAME>.json | jq '.layer | sort_by(.sizeBytes) | reverse | [limit(10; .[])]'
Display 10 large files in descending order
cat <FILENAME>.json | jq '.image.fileReference | sort_by(.sizeBytes) | reverse | [limit(10; .[])]'
By measuring the build time and checking the image size, it was possible to identify the problem area.
BuildKit will execute independent stages in parallel during multi-stage build. Let's actually check the behavior with a Dockerfile like the one below.
FROM alpine AS stage1
RUN echo "stage1" \
&& sleep 5 \
&& echo "stage1" > stage1.txt
FROM alpine AS stage2
RUN echo "stage2" \
&& sleep 5 \
&& echo "stage2" > stage2.txt
FROM alpine
COPY --from=stage1 /stage1.txt ./
COPY --from=stage2 /stage2.txt ./
$time docker build -t test . --no-cache
Sending build context to Docker daemon 8.192kB
Step 1/7 : FROM alpine AS stage1
---> 389fef711851
Step 2/7 : RUN echo "stage1" && sleep 5 && echo "stage1" > stage1.txt
---> Running in 060af4f159d1
stage1
Removing intermediate container 060af4f159d1
---> 47cacbebce55
Step 3/7 : FROM alpine AS stage2
---> 389fef711851
Step 4/7 : RUN echo "stage2" && sleep 5 && echo "stage2" > stage2.txt
---> Running in 5527e0adf01c
stage2
Removing intermediate container 5527e0adf01c
---> c4c36b1aaa7b
Step 5/7 : FROM alpine
---> 389fef711851
Step 6/7 : COPY --from=stage1 /stage1.txt ./
---> b29d3db1464c
Step 7/7 : COPY --from=stage2 /stage2.txt ./
---> 2ae9b56c1d34
Successfully built 2ae9b56c1d34
Successfully tagged test:latest
real 0m11.976s
user 0m0.261s
sys 0m0.190s
Since the sleep command for 5 seconds is executed for each of stage1 and stage2, it can be confirmed that it takes nearly 10 seconds.
$DOCKER_BUILDKIT=1 docker build -t test . --no-cache
[+] Building 5.6s (9/9) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 322B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.0s
=> [stage2 1/2] FROM docker.io/library/alpine 0.0s
=> => resolve docker.io/library/alpine:latest 0.0s
=> [stage2 2/2] RUN echo "stage2" && sleep 5 && echo "stage2" > stage2.txt 5.3s
=> [stage1 2/2] RUN echo "stage1" && sleep 5 && echo "stage1" > stage1.txt 5.3s
=> [runner 2/3] COPY --from=stage1 /stage1.txt ./ 0.1s
=> [runner 3/3] COPY --from=stage2 /stage2.txt ./ 0.1s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:27621000a30c0338903150a187a8b665a56137b5d14c018d0ac083a716d2fb72 0.0s
=> => naming to docker.io/library/test 0.0s
In this way, it can be confirmed that stage1 and stage2 are executed in parallel and it takes only about 5 seconds.
There are official Docker builds and GitHub Actions for push. https://github.com/docker/build-push-action From version 2 onwards, Buildx is used to build and push images. Combine this with actions/cache @ v2 to make the cache work on CI.
buildx is a CLI plugin that supports all the features of BuildKit. However, as of December 2020, it is an experimental function and is not recommended for production use. c.f. https://docs.docker.com/buildx/working-with-buildx/ c.f. https://github.com/docker/buildx
If you are using a multi-stage build and want to cache the intermediate stage as well, write mode = max
in the --chace-to
option.
With mode = min
, only the finally built stage will be cached.
BuildKit has the following three types of caches. This time I used the local type.
--inline: Embed cache in Docker image --registry: push image and cache separately --local: Export cache to local directory
Example The actual code looks like this:
..github/workflows/build.yaml
name: Build and Push Container
on:
push:
branches:
- 'main'
jobs:
build-push-image:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Cache Docker layers
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ap-northeast-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and Push
if: github.ref == 'refs/heads/master'
uses: docker/build-push-action@v2
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: ${{ github.repository }}
with:
push: true
tags: ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:$latest
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache,mode=max
build-args: |
ARG1=hoge
Specify the cache path with actions/cache @ v2
and use it with docker/build-push-action @ v2
.
As a result, it was confirmed that the cache is effective and the build time is shortened for the second and subsequent builds.
c.f. https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows
this time
--Measurement of Docker image --Parallel execution of multi-stage builds --Use of cache on CI
I wrote about. When implementing improvements, start with measurement, consider improvement costs and cost-effectiveness, and then start actual improvements. Also, make sure you follow the Dockerfile Best Practices (https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) before embarking on the improvements presented here. If CI is still taking a long time, why not consider the tips introduced here.
Recommended Posts