I want to provide a tool that non-engineer planning staff and marketing staff can upload data to BigQuery by themselves & It is troublesome for me to upload Excel data to BigQUery by myself, so I made it so that it can be operated with GUI It was.
GS: // bucket name (optional) / dataset name / table name .xlsx
.csv GS: // bucket name (optional) / dataset name / table name.csv
Create a bucket for data upload
Make the following settings (details omitted)
Python 3.7
Trigger type: Cloud Storage
Bucket: GCS bucket created above
GCF
requestments.txt
pandas
pandas-gbq
google-cloud-storage
google-cloud-bigquery
xlrd
main.py
from google.cloud import storage
from google.cloud import bigquery
import pandas as pd
import re
def gsc_to_bigquery_createtable(data, context):
# log
print(data)
print(context)
print('Folder Name : {}'.format(data['name']))
GETPATH = data['name']
m = re.match(
r'(?P<getDatasetId>.*)/(?P<getFileId>.*)\.(?P<getFileType>.*)',
GETPATH)
#Specify the bucket name and project name
BUCKET = 'Bucket name'
PROJECT_ID = 'Project name'
#Get the dataset name
DATASET_ID = m.group('getDatasetId')
#Get the file name
FILE_ID = m.group('getFileId')
#Get identifier
FILE_TYPE = m.group('getFileType')
TMP_PATH = '/tmp/' + FILE_ID + '.' + FILE_TYPE
#Data import from GSC to python
gcs = storage.Client(PROJECT_ID)
bucket = gcs.get_bucket(BUCKET)
blob = bucket.get_blob(GETPATH)
blob.download_to_filename(TMP_PATH)
#Conditional branching of identifiers
if FILE_TYPE == 'xlsx':
df = pd.read_excel(TMP_PATH)
elif FILE_TYPE == 'csv':
df = pd.read_csv(TMP_PATH)
#Create table from python to BigQuery
full_table_id = DATASET_ID + '.' + FILE_ID
df.to_gbq(full_table_id, project_id=PROJECT_ID, if_exists='replace')
# log
print('Folder Name : {}'.format(data['name']))
Recommended Posts