I think there are quite a few Needs who want to run Python Script on AWS on a regular basis. It can be realized by setting up EC2 and executing it with cron, but here I will explain how to realize it using the function of AWS Data Pipeline.
However, as a limitation of Data Pipeline, please note that the execution cycle can only be set to 15 minutes or more, and it cannot be executed every minute.
It is also possible to periodically execute the Lambda Function in Data Pipeline. If the Script is Node.js or Java, I think it's easier to do it this way.
The flow of items to be set is as follows. It is assumed that the Python Script itself has already been completed.
Create an S3 bucket to put the Python Script. Of course, the existing Bucket can be used. Go to AWS Console → S3 and follow the steps below to create an S3 Bucket.
Create Bucket
and give it an appropriate name (assuming you created a bucket called datapipeline-python-test
)Follow the steps below to upload Python Script to S3 Bucket.
→ ʻUpload
to upload the Python Script.
Here, it is assumed that the following script called datapipeline_test.py
, which simply prints the current time, has been uploaded.datapipeline_test.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
print 'Script run at ' + datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
Go to AWS Console → Data Pipeline and create a Data Pipeline by following the steps below.
Test Pipeline
)Run AWS CLI command
in Build using a template
15 minutes
for Run every (select ʻonce on pipeline activation` if you want to run it only once)datapipeline-python-test
created above.sudo yum -y install python-devel gcc && sudo update-alternatives --set python /usr/bin/python2.7 && curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" && sudo python ./get-pip.py && pip install boto3 --user && aws s3 cp s3://datapipeline-python-test/datapipeline_test.py ./datapipeline_test.py && cat datapipeline_test.py && python ./datapipeline_test.py
Select ʻEdit Architectwith this setting to create a Data Pipeline once. When created, two IAM Roles are created in the IAM Role:
DataPipelineDefaultResourceRole and
DataPipelineDefaultRole`.
Since some privileges are insufficient immediately after creating the IAM Role, grant access privileges to S3 to DataPipelineDefaultResourceRole
and DataPipelineDefaultRole
.
Go to AWS Console → Identity & Access Management → Roles and follow the steps below to grant permissions.
DataPipelineDefaultResourceRole
, select it, and select ʻAttach Policy
.Set the same permissions for DataPipelineDefaultRole
Go to AWS Console → Data Pipeline and activate the Data Pipeline you just created.
Test Pipeline
→ ʻActivate
The Data Pipeline's periodic execution is now activated. It runs every 15 minutes, so let's wait for a while.
Go to AWS Console → Data Pipeline, select Test Pipeline
, select Stdout
in CliActivity
→ ʻAttempts tab`, and confirm that the current time is output by Python Script.
I haven't done much, but I'll supplement the contents of the above ShellScript.
sudo yum -y install python-devel gcc
Additional Middleware is included in the OS (assuming that some Python libraries require gcc etc.). Amazon Linux standard Middleware can be deleted if it is sufficient
sudo update-alternatives --set python /usr/bin/python2.7
Python 2.7 is specified. Avoid Default Python version may cause an error in some libraries
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" && sudo python ./get-pip.py
Contains pip. It can be deleted if the standard Python Library is sufficient
pip install boto3 --user
I have an additional library of Python in pip. When using pip, the argument of --user
is required because of Permission. If you want to put multiple libraries, you can list pip install requests boto3 numpy --user
etc.
aws s3 cp s3://datapipeline-python-test/datapipeline_test.py ./datapipeline_test.py
Copying Python script to Local
cat datapipeline_test.py
The contents of the file downloaded from S3 are displayed, you can delete it if you don't need it.
python ./datapipeline_test.py
Finally, I'm running a Python Script
It is also possible to skip Alarm Email using the function of AWS SNS when Python Script fails. I will omit the explanation of AWS SNS itself, but I will briefly supplement the settings on Data Pipeline.
Test Pipeline
, and select ʻEdit Pipeline` → ʻOn Fail
in ʻActivities` in the right pane. is added to ʻActivities
, so select Create new: Action
and Defaul Action1
is created.DefaulAction1
in ʻOthers` in the right pane,Type
to SnsAlarm
Topic Arn
Message
is the body of the Alarm Email, and Subject
is the Subject of the Alarm Email.Role
is DataPipelineDefaultRole
by Default, but select a Role with Permission of ʻAmazonSNSFullAccess`It is OK if you set. It is possible to fire AWS SNS at the time of Script Fail or Success. Don't forget to give the Role Permission to run Sns.
If Python Script can be executed periodically with Data Pipeline, there is no need to individually secure / manage Hosts for periodic execution or guarantee execution, and various progress will be made.
Recommended Posts