** Upload CSV file to S3 → Start Lambda → Convert to JSON file **
Language: Python 3.8 AWS: S3、Lambda
First, prepare your IAM user, IAM role, S3 bucket, and more.
This time we will work with the AWS CLI, so we will create a dedicated IAM user.
"IAM"-> "Users"-> "Add User"
Username: Optional Access type: Check "Programmatic access"
This time, I want to perform basic operations related to S3 such as creating an S3 bucket, uploading and deleting files, so I will attach the "Amazon S3 Full Access" policy.
When the creation is completed
--Access key ID --Secret access key
Two of them will be issued, so make a note of them.
$ aws configure --profile s3-lambda
AWS Access Key ID [None]: ***************** #Enter your access key ID
AWS Secret Access Key [None]: ************************** #Enter your secret access key
Default region name [None]: ap-northeast-1
Default output format [None]: json
When you type the above command in the terminal, you will be asked for information interactively, so enter it while following the instructions.
I will create it using the AWS CLI that I set up earlier.
$ aws --profile s3-lambda s3 mb s3://test-bucket-for-converting-csv-to-json-with-lambda
make_bucket: test-bucket-for-converting-csv-to-json-with-lambda
Bucket names must be unique throughout the world, so think of your own.
** Create a test CSV file and upload it as a trial **
$ mkdir ./workspace/
$ cat > ./workspace/test.csv << EOF
heredoc> Name,Age,Country
heredoc> Taro,20,Japan
heredoc> EOF
$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda
upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv
Success if it is properly in the bucket.
$ aws --profile s3-lambda s3 rm s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv
I have confirmed the operation, so I will delete it.
Create an IAM role to assign to Lambda.
"IAM"-> "Role"-> "Create Role"
This time it is OK if you have the above two policies.
Please enter a name and description as appropriate to create it.
Now that the preparations have been completed, we will finally implement it from here.
"Lambda"-> "Create Function"
--Option: Create from scratch --Function name: Arbitrary --Runtime: Python 3.8 --Execution role: Existing role ("s3-lambda" created earlier) --Other: OK by default
Go to "Configuration"-> "Add Trigger" to decide what event will trigger Lambda.
I will fill in the necessary items.
--Trigger: S3 --Bucket: The bucket name you created earlier --Event type: All object creation events --Prefix: input / --Suffix: .csv
This time, it is assumed that Lambda will be started after detecting that the ".csv" file has been uploaded under the folder "input".
import json
import csv
import boto3
import os
from datetime import datetime, timezone, timedelta
s3 = boto3.client('s3')
def lambda_handler(event, context):
json_data = []
#TZ changed to Japan
JST = timezone(timedelta(hours=+9), 'JST')
timestamp = datetime.now(JST).strftime('%Y%m%d%H%M%S')
#Temporary read / write file (delete later)
tmp_csv = '/tmp/test_{ts}.csv'.format(ts=timestamp)
tmp_json = '/tmp/test_{ts}.json'.format(ts=timestamp)
#Final output file
outputted_json = 'output/test_{ts}.json'.format(ts=timestamp)
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
key_name = record['s3']['object']['key']
s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
data = s3_object['Body'].read()
contents = data.decode('utf-8')
try:
with open(tmp_csv, 'a') as csv_data:
csv_data.write(contents)
with open(tmp_csv) as csv_data:
csv_reader = csv.DictReader(csv_data)
for csv_row in csv_reader:
json_data.append(csv_row)
with open(tmp_json, 'w') as json_file:
json_file.write(json.dumps(json_data))
with open(tmp_json, 'r') as json_file_contents:
response = s3.put_object(Bucket=bucket_name, Key=outputted_json, Body=json_file_contents.read())
os.remove(tmp_csv)
os.remove(tmp_json)
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
Now when the CSV file is uploaded to the S3 bucket name "test-bucket-for-converting-csv-to-json-with-lambda/input /", "test-bucket-for-converting-csv-to-json-" The file converted to JSON format will be spit out to "with-lambda/output /".
$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda/input
upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/input/test.csv
Let's upload the file again with the AWS CLI.
If you check the bucket, a new folder called "output" should be created and the JSON file should be inside.
[
{
"Name": "Taro",
"Age": "20",
"Country": "Japan"
}
]
Check the contents, and if it is converted to JSON format firmly, you are done.
Thank you for your hard work. This time it was a conversion from CSV to JSON, but I think that other patterns can be realized in the same way.
I hope you find it helpful.
Convert CSV to JSON files with AWS Lambda and S3 Events
Recommended Posts