The situation where you want to expand the query result of BigQuery to GCS in-house and view it in Excel. BigQuery-> GCS spits out the result in utf8, so when I open it in Excel, Japanese characters are garbled. Therefore, implement cloud functions that will be converted to utf8 with bom without permission when you put the file in the bucket.
Cloud Function with Python If you are touching it for the first time, read this area and touch it.
Python Quick Start First Function: Python
You can create a function that fires when an object is created in Cloud Storage
Cloud Storage Tutorial #Finalize Objects
A function that converts the file in the bucket to utf8 with bom and uploads it with bom_
added to the prefix.
main.py
from google.cloud import storage
def convert_to_bom(data, context):
bucket_name = data['bucket']
file_path = data['name']
prefix = 'bom_'
file_path_arr = file_path.split('/')
file_name = file_path_arr[-1]
if file_name.startswith(prefix):
return 'skipping of bom file.'
dir_arr = file_path_arr[:-1]
dir_path = '/'.join(dir_arr) + '/'
local_file_path = '/tmp/' + file_name
if(len(file_path_arr) == 1):
new_file_path = prefix + file_path
else:
new_file_path = dir_path + prefix + file_name
client = storage.Client()
bucket = client.get_bucket(bucket_name)
dl_blob = bucket.get_blob(file_path)
up_blob = bucket.blob(new_file_path)
with open(local_file_path, 'w', newline='', encoding='utf_8_sig', errors='ignore') as f:
f.write(dl_blob.download_as_string().decode('utf8'))
up_blob.upload_from_filename(local_file_path)
return 'success'
requirements.txt
-i https://pypi.org/simple
cachetools==4.1.0
certifi==2020.4.5.1
chardet==3.0.4
google-api-core==1.19.0
google-auth==1.16.1
google-cloud-core==1.3.0
google-cloud-storage==1.28.1
google-resumable-media==0.5.1
googleapis-common-protos==1.52.0
idna==2.9
protobuf==3.12.2
pyasn1-modules==0.2.8
pyasn1==0.4.8
pytz==2020.1
requests==2.23.0
rsa==4.0
six==1.15.0
urllib3==1.25.9
gcloud functions deploy convert_to_bom --runtime python37 --trigger-resource ${YOUR_BUCKET} --trigger-event google.storage.object.finalize
Be careful as you cannot write to directories other than / tmp
. When I try to write, the function crashes and dies quietly.
The only writable part of the file system is the / tmp directory. This directory can be used as a storage location for temporary files for function instances.
Cloud Functions execution environment # file system
PythonClientforGoogleCloudStorage [GoogleCloudStorage] How to use GCS Python API [Note]
Recommended Posts