Try writing JSON format data to object storage Cloudian/S3

Introduction

You can use the AWS SDK for Python (boto3) to access Cloudian and S3 with almost no changes to your program. I hope it will be a reference for those who want to use object storage in a hybrid (on-premises: Cloudian, AWS: S3).

Overview

Object storage A Python program that writes JSON format data to the bucket name "boto3-cloudian" on Cloudian/S3.

The number of JSON format data to be generated can be specified by a parameter and written to the file "test-iot-dummy.json". By customizing the program, we are assuming that it can be used as IoT data generation. Please refer to "items" in the program for the generated data items.

--There are three types of parameters as follows. ---- count: Number of data to be generated (default: 10) ---- proc: Name of the process to be created (default: 111) ---- mode: Specify the output destination of the generated data tm: Output to the terminal, s3: Output to Cloudian/S3 (default: tm)

See also the help displayed by specifying the parameter "-h" when the program is executed.

Execution environment

macOS Big Sur 11.1 python 3.8.3

Definition of credential information

This time, the credential information is defined in .zshenv and the program is executed. Please define according to the connection destination.

# AWS S3
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=yyyyyyyyyyyyyyyyy
export AWS_DEFAULT_REGION=ap-northeast-1

# Cloudian
#export AWS_ACCESS_KEY_ID=aaaaaaaaaaaaaaaaaa
#export AWS_SECRET_ACCESS_KEY=bbbbbbbbbbbbbbbbbbbb
#export AWS_DEFAULT_REGION=pic

Execution program

If you want to access Cloudian, please enter endpoint_url (see in the program).

IoTSample-write.py



import random
import json
import time
from datetime import date, datetime
from collections import OrderedDict
import argparse
import string
import boto3
import pprint

from faker.factory import Factory

BUCKET_NAME = 'boto3-cloudian'
OBJECT_KEY = 'test-iot-dummy.json'


#Using Faker to create dummy data
Faker = Factory.create
fake = Faker()
fake = Faker("ja_JP")

#Dummy section of IoT device(Define lowercase alphabet)
section = string.ascii_uppercase


#Creating JSON data sent by IoT device
def iot_json_data(count, proc):
    iot_items = json.dumps({
        'items': [{
            'id': i,                            # id
            'time': generate_time(),            #Data generation time
            'proc': proc,                       #Data generation process name
            'section': random.choice(section),  #IoT equipment section
            'iot_num': fake.zipcode(),          #IoT device number
            'iot_state': fake.prefecture(),     #IoT installation location
            'vol_1': random.uniform(100, 200),  #IoT value-1
            'vol_2': random.uniform(50, 90)     #IoT value-2
            } 
            for i in range(count)
        ]
    }, ensure_ascii=False).encode('utf-8')
    return iot_items


#Generation time of dummy data measured by IoT devices
def generate_time():
    dt_time = datetime.now()
    gtime = json_trans_date(dt_time)
    return gtime

# date,datetime conversion function
def json_trans_date(obj):
    #Convert date type to string
    if isinstance(obj, (datetime, date)):
        return obj.isoformat()
    #Except for the above.
    raise TypeError ("Type %s not serializable" % type(obj))


#Main (for terminal output)
def tm_main(count, proc):
    print('Terminal output')
    iotjsondata = iot_json_data(count, proc)
    pprint.pprint(iotjsondata)


#Main (Cloudian/For S3 output)
def s3_main(count, proc):
    print('Cloudian/S3 output')
    iotjsondata = iot_json_data(count, proc)
    # pprint.pprint(iotjsondata)

    # client = boto3.client('s3', endpoint_url='http://s3-pic.networld.local')    #When accessing Cloudian
    client = boto3.client('s3')                                                 #When accessing S3
    client.put_object(
        Bucket=BUCKET_NAME,
        Key=OBJECT_KEY,
        Body=iotjsondata
    )


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Generation of dummy data for IoT devices')
    parser.add_argument('--count', type=int, default=10, help='Number of data created')
    parser.add_argument('--proc', type=str, default='111', help='Data creation process name')
    parser.add_argument('--mode', type=str, default='tm', help='tm (terminal output of generated data) / s3 (generated data of Cloudian/S3 output)')
    args = parser.parse_args()

    start = time.time()
    
    if (args.mode == 's3'): 
        s3_main(args.count, args.proc)
    else :
        tm_main(args.count, args.proc)

    making_time = time.time() - start

    print("")
    print(f"Number of data created:{args.count}")
    print("Data creation time(Normal_API):{0}".format(making_time) + " [sec]")
    print("")

Program execution

Let's get help first.

$ python IoTSample-write.py -h            
usage: IoTSample-write.py [-h] [--count COUNT] [--proc PROC] [--mode MODE]

Generation of dummy data for IoT devices

optional arguments:
  -h, --help     show this help message and exit
  --count COUNT Number of data creations
  --proc PROC data creation process name
  --mode MODE tm (terminal output of generated data) / s3 (generated data of Cloudian)/S3 output)

Next, generate 100,000 data and output it to the terminal.

$ python IoTSample-write.py --count 100000
     :
Output content is omitted
     :
Number of data created:100000
Data creation time(Normal_API):4.5370988845825195 [sec]

Now let's actually generate 100,000 data in Cloudian/S3.

$ python IoTSample-write.py --count 100000 --mode s3

Number of data created:100000
Data creation time(Normal_API):2.7221038341522217 [sec]

Summary

This time, I was able to confirm that data is generated to the object storage Cloudian / S3 using the AWS SDK for Python (boto3) (100,000 data can be created in a few seconds (depending on the environment, of course)). ..

For Cloudian, check here (https://qiita.com/yamahiro/items/7b8a11c773106b641795).

Recommended Posts

Try writing JSON format data to object storage Cloudian/S3
Try to extract specific data from JSON format data in object storage Cloudian/S3
Python code for writing CSV data to DSX object storage
[Introduction to Python] How to handle JSON format data
Convert json format data to txt (using yolo)
Convert / return class object to JSON format in Python
Parse JSON file to object
[Python] Use JSON format data as a dictionary type object
Convert Tweepy Status object to JSON
Export DB data in json format
[Cloudian # 2] Try to display the object storage bucket in Python (boto3)
Try to put data in MongoDB
Merge JSON format data with Ansible
[Cloudian # 3] Try to create a new object storage bucket with Python (boto3)
[Cloudian # 1] Try to access object storage with AWS SDK for Python (boto3)
Convert xml format data to txt format data (yolov3)
Convert Excel data to JSON with python
Try converting to tidy data with pandas
[Introduction to SEIR model] Try fitting COVID-19 data ♬
Try using django-import-export to add csv data to django
Try to aggregate doujin music data with pandas
Just add the python array to the json data
Anyway, I want to check JSON data easily
How to generate a Python object from JSON