Using Cloud Storage from Python3 (Introduction)

Introduction

What is BytesIO

BytesIO is a function for handling binary data in memory and is included in Python's standard library io. Binary data mainly refers to data such as images and sounds. (It's like MemoryStream in C #)

Create a service account

Go to Google Cloud Platform and create a service account (API). Click Navigation Menu> APIs & Services> Credentials to go to the screen. Then click Manage Service Accounts. GCP.png

On the next screen, click Create Service Account. GCP.png Enter the details of the service account for each item.

Setting items Settings
Service account name (Set to any name)
Service account description (Optional settings for easy understanding of each project)

Click Create when you are done. スクリーンショット 2020-10-31 20.24.32.png In the next section, you will create a role for Cloud Storage. Since this is a test, I chose Storage Administrator (full permission). Let's change the role according to the application used. スクリーンショット 2020-10-31 20.25.59.png

You can omit the last item. Click the service account you created, click Add Key in the Keys field, and select Create New Key. スクリーンショット 2020-10-31 20.00.44.png

GCp.png

Select JSON as the key type and "create" it. Since the JSON file is downloaded in the local storage, use the JSON file in the next item and operate Cloud Storage from Python.

Operate Cloud Storage from Python

Python library installation

We will install the libraries required for this project in order. First, install the Google Cloud Storage library with pip install to access Cloud Storage.

pip install google-cloud-storage

Pillow will also be installed to save to a file locally using Pillow via BytesIO.

pip install requests pillow

Install openpyxl for those who want to try Excel files etc.

pip install openpyxl

Preparing buckets on Cloud Storage

Access Google Cloud Storage from the GUI, create a bucket and prepare an appropriate image file. From the console screen, go to Navigation Menu> Cloud Storage. This time, I downloaded the Python logo file, named it image.png, and saved it. Python logo file スクリーンショット 2020-10-31 20.53.02.png

Access Cloud Storage from Python and get images

We will access Cloud Storage using the access user key created at the beginning. First of all, you need to load the access user's key. Here are two ways to refer to the access.

#When accessing using environment variables
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '{Enter the path where the json file is located}'
client= storage.Client()

#When accessing the json file directly.
client = storage.Client.from_service_account_json('{Enter the path where the json file is located}')

This is the Python code that gets the image file saved in Google Cloud Storage (GCS) and saves the file locally. As the explanation that the program is doing, get the bucket you want to save the file and the blob instance of the save file name. Since the binary data of the image is included, it will be read by Pillow via BytesIO, and any image file will be written. You can also download the file directly from GCS by using blob.download_to_filename , but this time GCS The data is passed to the PIL library etc. on the assumption that the file dropped from is edited. </ font> </ strong>

from google.cloud import storage
from PIL import Image
import io

#Create a client instance for google cloud storage
#client = storage.Client()
client = storage.Client.from_service_account_json('●●●●●●●●●●.json')
#Get an instance of a bucket
bucket = client.bucket('{Bucket name created with any name}')

#Get a blob instance of a file
blob = bucket.blob('image.png')
img = Image.open(io.BytesIO(blob.download_as_string()))
img.save('sample.png')

Access Cloud Storage from Python and get an Excel file

Get the Excel file on GCS, read the Blob data with openpyxl via BytesIO and save it. You can also download the file directly from GCS by using blob.download_to_filename , but this time GCS The data is passed to the PIL library etc. on the assumption that the file dropped from is edited. </ font> </ strong>

from google.cloud import storage
import openpyxl
import io

#Create a client instance for google cloud storage
client = storage.Client.from_service_account_json('●●●●●●●●●●.json')
#Get an instance of a bucket
bucket = client.bucket('{Bucket name created with any name}')

##Get a blob instance of a file
blob = bucket.blob('test.xlsx')
buffer = io.BytesIO()
blob.download_to_file(buffer)
wb = openpyxl.load_workbook(buffer)
wb.save('./retest.xlsx')

Articles that I used as a reference

-Resize and re-upload Cloud Storage images-Qiita[Python] Handling images of binary data | Kazusa programmer's miscellaneous notes · [Errors in the process of using Google Cloud Storage and BigQuery with Python | Monotalk](https://www.monotalk.xyz/blog/python-%E3%81%A6-google-cloud-storage-%E3% 81% A8bigquery-% E3% 82% 92% E4% BD% BF% E3% 81% 86% E9% 81% 8E% E7% A8% 8B% E3% 81% A6% E3% 81% AE% E3% 82 % A8% E3% 83% A9% E3% 83% BC /) -Python --I want to edit an Excel file on Cloud Storage with GCP Cloud Functions (Python) | teratail -How to create / save a new Excel file using OpenPyXL in PythonHow to use BytesIO (and StringIO, cStringIO) [For beginners] | TechAcademy Magazine -Handling files without saving intermediate data using on-memory streams in Python | hgrs's BlogHow to access Google Cloud Storage with python and upload / download files | Dodotechno[Python] Handling images of binary data | Kazusa programmer's miscellaneous notes

Recommended Posts