This article is a 12/16 article of BeeX Advent Calendar 2020.
==
In the previous article I tried running it on a Lambda container image.
This time, in addition to that, I tried to use the Embulk command that I often use recently.
PC:Windows 10 Docker:Docker version 19.03.13, build 4484c46d9d
Create a Lambda Function by referring to @ shiro01's article. This time, we will confirm the execution of Embulk, so let's simply execute the help command.
lambda_function.py
import subprocess
def lambda_handler(event, context):
cmd = ['/usr/bin/embulk','help']
out = subprocess.run(cmd,shell=True , stdout=subprocess.PIPE)
print(out.stdout.decode())
One caveat is that "shell = Treu" is added to the argument of subprocess.run. If you do not add this, you will get an OS Error.
Next, create a Dockerfile. Embulk is installed in advance and an executable file is created. Copy Embulk and lambda_function to the container with COPY of Dockerfile.
FROM amazon/aws-lambda-python:3.7
COPY lambda_function.py ./
COPY embulk /usr/bin
RUN chmod +x /usr/bin/embulk
CMD [ "lambda_function.lambda_handler" ]
Build the image with the following command.
$ cd [DockerFile storage destination DIR]
$ docker build -t lambda_embulk .
The image is created.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
lambda_embulk latest 472005da7cf7 About a minute ago 980MB
You can run Lambda locally with the following command.
$ docker run -p 9000:8080 lambda_embulk:latest
time="2020-12-15T06:11:57.95" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
Open another window and run the following command. Null is returned as the response of the curl command.
$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d''{}'
null
If you check the window running the container, the result of the execution is displayed on the console. The local test looks fine as the "embulk help" command is running.
$ docker run -p 9000:8080 lambda_embulk:latest
time="2020-12-15T06:13:53.604" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
START RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201 Version: $LATEST
time="2020-12-15T06:13:57.601" level=info msg="extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory"
time="2020-12-15T06:13:57.601" level=warning msg="Cannot list external agents" error="open /opt/extensions: no such file or directory"
Embulk v0.9.23
Usage: embulk [-vm-options] <command> [--options]
Commands:
mkbundle <directory> # create a new plugin bundle environment.
bundle [directory] # update a plugin bundle environment.
run <config.yml> # run a bulk load transaction.
cleanup <config.yml> # cleanup resume state.
preview <config.yml> # dry-run the bulk load without output and show preview.
guess <partial-config.yml> -o <output.yml> # guess missing parameters to create a complete configuration file.
gem <install | list | help> # install a plugin or show installed plugins.
new <category> <name> # generates new plugin template
migrate <path> # modify plugin code to use the latest Embulk plugin API
example [path] # creates an example config file and csv file to try embulk.
selfupdate [version] # upgrades embulk to the latest released version or to the specified version.
VM options:
-E... Run an external script to configure environment variables in JVM
(Operations not just setting envs are not recommended nor guaranteed.
Expect side effects by running your external script at your own risk.)
-J-O Disable JVM optimizations to speed up startup time (enabled by default if command is 'run')
-J+O Enable JVM optimizations to speed up throughput
-J... Set JVM options (use -J-help to see available options)
-R--dev Set JRuby to be in development mode
Use `<command> --help` to see description of the commands.
END RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201
REPORT RequestId: 0ed577b3-6a07-447a-bccb-c03d255c0201 Init Duration: 1.03 ms Duration: 591.61 ms Billed Duration: 600 mMemory Size: 3008 MB Max Memory Used: 3008 MB
Create a repository from the AWS console.
Push the image to the ECR repository created by the following command. This area is almost the same as the previous article. Embulk is a little sized, so it will take longer to push than last time.
$ docker tag lambda_embulk:latest XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
941996685139.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk latest 472005da7cf7 11 minutes ago 980MB
$ aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com
Login Succeeded
$ docker push XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk:latest
The push refers to repository [XXXXXXXXXXXXX.dkr.ecr.ap-northeast-1.amazonaws.com/lambda-embulk]
5f70bf18a086: Pushed
65f9fe7cdd01: Pushed
1965e83122e7: Pushed
701bdcbf3b47: Pushed
6e660533f001: Pushed
069cd8bd11dd: Pushed
6e191121f7ea: Pushed
d6fa53d6caa6: Pushed
1fb474cee41c: Pushed
b1754cf6954d: Pushed
464c816a7003: Pushed
latest: digest: sha256:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX size: 2624
I was able to upload successfully.
Create a Lambda function from the AWS console.
The Lambda function has been created.
I got an error with timeout.
Since the timeout value was 3 seconds in the basic setting of Lambda, change it to 3 minutes for the time being.
It took a while, but it ended normally. Looking at the output, it seems that Embulk's help command can also be executed.
Whether this is good or not, I was able to successfully run Embulk on Lambda. Perhaps there are various things to consider such as Lambda specifications and where to hold the diff file for actual operation, but for the time being, it may be possible to put it in s3 and run small-scale processing from there. I don't know.
However, since the processing of Embulk does not often fit in the execution time of Lambda, that area may be difficult.
Recommended Posts