[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda

Configuration aimed at in this article

** Upload CSV file to S3 → Start Lambda → Convert to JSON file **

Technology used

Language: Python 3.8 AWS: S3、Lambda

Preparation

First, prepare your IAM user, IAM role, S3 bucket, and more.

Create an IAM user

This time we will work with the AWS CLI, so we will create a dedicated IAM user.

スクリーンショット 2021-01-19 19.22.58.png

"IAM"-> "Users"-> "Add User"

Username: Optional Access type: Check "Programmatic access"

スクリーンショット 2021-01-19 19.28.13.png

This time, I want to perform basic operations related to S3 such as creating an S3 bucket, uploading and deleting files, so I will attach the "Amazon S3 Full Access" policy.

スクリーンショット 2021-01-19 19.30.55_censored.jpg

When the creation is completed

--Access key ID --Secret access key

Two of them will be issued, so make a note of them.

$ aws configure --profile s3-lambda

AWS Access Key ID [None]: ***************** #Enter your access key ID
AWS Secret Access Key [None]: ************************** #Enter your secret access key
Default region name [None]: ap-northeast-1
Default output format [None]: json

When you type the above command in the terminal, you will be asked for information interactively, so enter it while following the instructions.

Create an S3 bucket

I will create it using the AWS CLI that I set up earlier.

$ aws --profile s3-lambda s3 mb s3://test-bucket-for-converting-csv-to-json-with-lambda

make_bucket: test-bucket-for-converting-csv-to-json-with-lambda

Bucket names must be unique throughout the world, so think of your own.

** Create a test CSV file and upload it as a trial **

$ mkdir ./workspace/
$ cat > ./workspace/test.csv << EOF
heredoc> Name,Age,Country
heredoc> Taro,20,Japan
heredoc> EOF

$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda

upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv

スクリーンショット 2021-01-19 20.04.25.png

Success if it is properly in the bucket.

$ aws --profile s3-lambda s3 rm s3://test-bucket-for-converting-csv-to-json-with-lambda/test.csv

I have confirmed the operation, so I will delete it.

Create an IAM role

Create an IAM role to assign to Lambda.

スクリーンショット 2021-01-19 19.05.42.png

"IAM"-> "Role"-> "Create Role"

AmazonS3FullAccess
AWSLambdaBasicExecutionRole

This time it is OK if you have the above two policies.

スクリーンショット 2021-01-19 19.10.42.png

Please enter a name and description as appropriate to create it.

Implementation

Now that the preparations have been completed, we will finally implement it from here.

Create a Lambda function

スクリーンショット 2021-01-19 20.12.45.png

"Lambda"-> "Create Function"

--Option: Create from scratch --Function name: Arbitrary --Runtime: Python 3.8 --Execution role: Existing role ("s3-lambda" created earlier) --Other: OK by default

Create a trigger

スクリーンショット 2021-01-19 20.19.09.png

Go to "Configuration"-> "Add Trigger" to decide what event will trigger Lambda.

スクリーンショット 2021-01-19 20.22.04.png

I will fill in the necessary items.

--Trigger: S3 --Bucket: The bucket name you created earlier --Event type: All object creation events --Prefix: input / --Suffix: .csv

This time, it is assumed that Lambda will be started after detecting that the ".csv" file has been uploaded under the folder "input".

code

import json
import csv
import boto3
import os
from datetime import datetime, timezone, timedelta

s3 = boto3.client('s3')

def lambda_handler(event, context):
    
    json_data = []
    
    #TZ changed to Japan
    JST = timezone(timedelta(hours=+9), 'JST')
    timestamp = datetime.now(JST).strftime('%Y%m%d%H%M%S')
    
    #Temporary read / write file (delete later)
    tmp_csv = '/tmp/test_{ts}.csv'.format(ts=timestamp)
    tmp_json = '/tmp/test_{ts}.json'.format(ts=timestamp)
    
    #Final output file
    outputted_json = 'output/test_{ts}.json'.format(ts=timestamp)

    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        key_name = record['s3']['object']['key']
    
    s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
    data = s3_object['Body'].read()
    contents = data.decode('utf-8')
    
    try:
        with open(tmp_csv, 'a') as csv_data:
            csv_data.write(contents)
        
        with open(tmp_csv) as csv_data:
            csv_reader = csv.DictReader(csv_data)
            for csv_row in csv_reader:
                json_data.append(csv_row)
                
        with open(tmp_json, 'w') as json_file:
            json_file.write(json.dumps(json_data))
        
        with open(tmp_json, 'r') as json_file_contents:
            response = s3.put_object(Bucket=bucket_name, Key=outputted_json, Body=json_file_contents.read())
    
        os.remove(tmp_csv)
        os.remove(tmp_json)
    
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Now when the CSV file is uploaded to the S3 bucket name "test-bucket-for-converting-csv-to-json-with-lambda/input /", "test-bucket-for-converting-csv-to-json-" The file converted to JSON format will be spit out to "with-lambda/output /".

$ aws --profile s3-lambda s3 sync ./workspace s3://test-bucket-for-converting-csv-to-json-with-lambda/input

upload: ./test.csv to s3://test-bucket-for-converting-csv-to-json-with-lambda/input/test.csv

Let's upload the file again with the AWS CLI.

スクリーンショット 2021-01-19 20.39.29.png スクリーンショット 2021-01-19 20.39.59.png

If you check the bucket, a new folder called "output" should be created and the JSON file should be inside.

[
    {
        "Name": "Taro",
        "Age": "20",
        "Country": "Japan"
    }
]

Check the contents, and if it is converted to JSON format firmly, you are done.

Afterword

Thank you for your hard work. This time it was a conversion from CSV to JSON, but I think that other patterns can be realized in the same way.

I hope you find it helpful.

Reference article

Convert CSV to JSON files with AWS Lambda and S3 Events