You can use the AWS SDK for Python (boto3) to access Cloudian and S3 with almost no changes to your program. I hope it will be a reference for those who want to use object storage in a hybrid (on-premises: Cloudian, AWS: S3).
Object storage A Python program that writes JSON format data to the bucket name "boto3-cloudian" on Cloudian/S3.
The number of JSON format data to be generated can be specified by a parameter and written to the file "test-iot-dummy.json". By customizing the program, we are assuming that it can be used as IoT data generation. Please refer to "items" in the program for the generated data items.
--There are three types of parameters as follows. ---- count: Number of data to be generated (default: 10) ---- proc: Name of the process to be created (default: 111) ---- mode: Specify the output destination of the generated data tm: Output to the terminal, s3: Output to Cloudian/S3 (default: tm)
See also the help displayed by specifying the parameter "-h" when the program is executed.
macOS Big Sur 11.1 python 3.8.3
This time, the credential information is defined in .zshenv and the program is executed. Please define according to the connection destination.
# AWS S3
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=yyyyyyyyyyyyyyyyy
export AWS_DEFAULT_REGION=ap-northeast-1
# Cloudian
#export AWS_ACCESS_KEY_ID=aaaaaaaaaaaaaaaaaa
#export AWS_SECRET_ACCESS_KEY=bbbbbbbbbbbbbbbbbbbb
#export AWS_DEFAULT_REGION=pic
If you want to access Cloudian, please enter endpoint_url (see in the program).
IoTSample-write.py
import random
import json
import time
from datetime import date, datetime
from collections import OrderedDict
import argparse
import string
import boto3
import pprint
from faker.factory import Factory
BUCKET_NAME = 'boto3-cloudian'
OBJECT_KEY = 'test-iot-dummy.json'
#Using Faker to create dummy data
Faker = Factory.create
fake = Faker()
fake = Faker("ja_JP")
#Dummy section of IoT device(Define lowercase alphabet)
section = string.ascii_uppercase
#Creating JSON data sent by IoT device
def iot_json_data(count, proc):
iot_items = json.dumps({
'items': [{
'id': i, # id
'time': generate_time(), #Data generation time
'proc': proc, #Data generation process name
'section': random.choice(section), #IoT equipment section
'iot_num': fake.zipcode(), #IoT device number
'iot_state': fake.prefecture(), #IoT installation location
'vol_1': random.uniform(100, 200), #IoT value-1
'vol_2': random.uniform(50, 90) #IoT value-2
}
for i in range(count)
]
}, ensure_ascii=False).encode('utf-8')
return iot_items
#Generation time of dummy data measured by IoT devices
def generate_time():
dt_time = datetime.now()
gtime = json_trans_date(dt_time)
return gtime
# date,datetime conversion function
def json_trans_date(obj):
#Convert date type to string
if isinstance(obj, (datetime, date)):
return obj.isoformat()
#Except for the above.
raise TypeError ("Type %s not serializable" % type(obj))
#Main (for terminal output)
def tm_main(count, proc):
print('Terminal output')
iotjsondata = iot_json_data(count, proc)
pprint.pprint(iotjsondata)
#Main (Cloudian/For S3 output)
def s3_main(count, proc):
print('Cloudian/S3 output')
iotjsondata = iot_json_data(count, proc)
# pprint.pprint(iotjsondata)
# client = boto3.client('s3', endpoint_url='http://s3-pic.networld.local') #When accessing Cloudian
client = boto3.client('s3') #When accessing S3
client.put_object(
Bucket=BUCKET_NAME,
Key=OBJECT_KEY,
Body=iotjsondata
)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Generation of dummy data for IoT devices')
parser.add_argument('--count', type=int, default=10, help='Number of data created')
parser.add_argument('--proc', type=str, default='111', help='Data creation process name')
parser.add_argument('--mode', type=str, default='tm', help='tm (terminal output of generated data) / s3 (generated data of Cloudian/S3 output)')
args = parser.parse_args()
start = time.time()
if (args.mode == 's3'):
s3_main(args.count, args.proc)
else :
tm_main(args.count, args.proc)
making_time = time.time() - start
print("")
print(f"Number of data created:{args.count}")
print("Data creation time(Normal_API):{0}".format(making_time) + " [sec]")
print("")
Let's get help first.
$ python IoTSample-write.py -h
usage: IoTSample-write.py [-h] [--count COUNT] [--proc PROC] [--mode MODE]
Generation of dummy data for IoT devices
optional arguments:
-h, --help show this help message and exit
--count COUNT Number of data creations
--proc PROC data creation process name
--mode MODE tm (terminal output of generated data) / s3 (generated data of Cloudian)/S3 output)
Next, generate 100,000 data and output it to the terminal.
$ python IoTSample-write.py --count 100000
:
Output content is omitted
:
Number of data created:100000
Data creation time(Normal_API):4.5370988845825195 [sec]
Now let's actually generate 100,000 data in Cloudian/S3.
$ python IoTSample-write.py --count 100000 --mode s3
Number of data created:100000
Data creation time(Normal_API):2.7221038341522217 [sec]
This time, I was able to confirm that data is generated to the object storage Cloudian / S3 using the AWS SDK for Python (boto3) (100,000 data can be created in a few seconds (depending on the environment, of course)). ..
For Cloudian, check here (https://qiita.com/yamahiro/items/7b8a11c773106b641795).
Recommended Posts