You can use the AWS SDK for Python (boto3) to access Cloudian and S3 with almost no changes to your program. I hope it will be a reference for those who want to use object storage in a hybrid (on-premises: Cloudian, AWS: S3).
Object storage A Python program that extracts specific data from a JSON-formatted data file in the bucket name "boto3-cloudian" on Cloudian/S3.
The JSON format data file is "test-iot-dummy.json" generated by this, and there are 100,000 data items. From that data, items with item "section" = "R" are extracted, and the number of data items and the time taken for extraction are displayed.
The extraction item "section" can also be defined as a parameter (one uppercase letter of the alphabet). See also the help displayed by specifying the parameter "-h" when the program is executed.
macOS Big Sur 11.1 python 3.8.3
This time, the credential information is defined in .zshenv and the program is executed. Please define according to the connection destination.
# AWS S3
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=yyyyyyyyyyyyyyyyy
export AWS_DEFAULT_REGION=ap-northeast-1
# Cloudian
#export AWS_ACCESS_KEY_ID=aaaaaaaaaaaaaaaaaa
#export AWS_SECRET_ACCESS_KEY=bbbbbbbbbbbbbbbbbbbb
#export AWS_DEFAULT_REGION=pic
If you want to access Cloudian, please enter endpoint_url (see in the program).
IoTSample-read.py
import json
import time
import argparse
import boto3
BUCKET_NAME = 'boto3-cloudian'
OBJECT_KEY = 'test-iot-dummy.json'
# S3-Get all data with API and filter with logic(Item: Section)
def section_main(section):
# client = boto3.client('s3', endpoint_url='http://s3-pic.networld.local') #When accessing Cloudian
client = boto3.client('s3') #When accessing S3
response = client.get_object(
Bucket=BUCKET_NAME,
Key=OBJECT_KEY
)
target = []
if 'Body' in response:
body = response['Body'].read()
text = body.decode('utf-8')
items = json.loads(text)
items = items['items']
target = list(filter(lambda x: x['section'] == section, items))
# print(target)
return len(target)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Extraction of dummy data for IoT devices')
parser.add_argument('--section', type=str, default='R', help='Specify the section to extract (one uppercase letter of the alphabet)')
args = parser.parse_args()
start = time.time()
num = section_main(args.section)
select_time = time.time() - start
print("")
print(f"Number of data extractions:{num}")
print("Extraction processing time(Normal_API):{0}".format(select_time) + " [sec]")
print("")
Let's get help first.
$ python IoTSample-read.py -h
usage: IoTSample-read.py [-h] [--section SECTION]
Extraction of dummy data for IoT devices
optional arguments:
-h, --help show this help message and exit
--section SECTION Specify the section to extract (one uppercase letter of the alphabet)
Now, let's perform data extraction with the default extraction items.
$ python IoTSample-read.py
Number of data extractions:3929
Extraction processing time(Normal_API):3.6708199977874756 [sec]
Next, it is the execution when the extraction item is specified.
$ python IoTSample-read.py --section Z
Number of data extractions:3792
Extraction processing time(Normal_API):3.5160200595855713 [sec]
This time, I was able to confirm that the data was extracted from the object storage Cloudian / S3 using the AWS SDK for Python (boto3) (the target data was extracted from 100,000 data in a few seconds (of course, it depends on the environment). To do)).
For Cloudian, check here (https://qiita.com/yamahiro/items/7b8a11c773106b641795).
Recommended Posts