[GCF + Python] How to upload Excel to GCS and create a new table in BigQuery

Purpose

I want to provide a tool that non-engineer planning staff and marketing staff can upload data to BigQuery by themselves & It is troublesome for me to upload Excel data to BigQUery by myself, so I made it so that it can be operated with GUI It was.

Assumed behavior

  1. ** Upload Excel (.xlsx or .csv) to Cloud Storage. ** **
  1. ** GCS upload triggers GCF to work. ** **
  2. ** A new table is created in BigQuery. ** ** (* If the same table name already exists, it will be overwritten)

Preparation

Create a bucket for data upload

Make the following settings (details omitted) Python 3.7 Trigger type: Cloud Storage Bucket: GCS bucket created above

GCF

requestments.txt


pandas
pandas-gbq
google-cloud-storage
google-cloud-bigquery
xlrd

main.py


from google.cloud import storage
from google.cloud import bigquery
import pandas as pd
import re


def gsc_to_bigquery_createtable(data, context):
    # log
    print(data)
    print(context)
    print('Folder Name : {}'.format(data['name']))
    GETPATH = data['name']
    m = re.match(
        r'(?P<getDatasetId>.*)/(?P<getFileId>.*)\.(?P<getFileType>.*)',
        GETPATH)
    #Specify the bucket name and project name
    BUCKET = 'Bucket name'
    PROJECT_ID = 'Project name'
    #Get the dataset name
    DATASET_ID = m.group('getDatasetId')
    #Get the file name
    FILE_ID = m.group('getFileId')
    #Get identifier
    FILE_TYPE = m.group('getFileType')

    TMP_PATH = '/tmp/' + FILE_ID + '.' + FILE_TYPE
    #Data import from GSC to python
    gcs = storage.Client(PROJECT_ID)
    bucket = gcs.get_bucket(BUCKET)
    blob = bucket.get_blob(GETPATH)
    blob.download_to_filename(TMP_PATH)

    #Conditional branching of identifiers
    if FILE_TYPE == 'xlsx':
        df = pd.read_excel(TMP_PATH)
    elif FILE_TYPE == 'csv':
        df = pd.read_csv(TMP_PATH)

    #Create table from python to BigQuery
    full_table_id = DATASET_ID + '.' + FILE_ID
    df.to_gbq(full_table_id, project_id=PROJECT_ID, if_exists='replace')
  # log
    print('Folder Name : {}'.format(data['name']))

Recommended Posts

[GCF + Python] How to upload Excel to GCS and create a new table in BigQuery
How to create a JSON file in Python
[Pandas] How to check duplicates and delete duplicates in a table (equivalent to deleting duplicates in Excel)
[Python] How to output a pandas table to an excel file
How to create an instance of a particular class from dict using __new__ () in python
[Python] How to delete rows and columns in a table (list of drop method options)
[Python] How to add rows and columns to a table (pandas DataFrame)
How to create a heatmap with an arbitrary domain in Python
How to work with BigQuery in Python
How to get a stacktrace in python
How to display multiplication table in python
How to use is and == in Python
How to generate a new loggroup in CloudWatch using python within Lambda
How to put a half-width space before letters and numbers in Python.
How to stop a program in python until a specific date and time
Edit Excel from Python to create a PivotTable
How to create a Python virtual environment (venv)
How to clear tuples in a list (Python)
How to generate permutations in Python and C ++
How to embed a variable in a python string
I want to create a window in Python
Automatically create word and excel reports in python
How to notify a Discord channel in Python
Create a new page in confluence with Python
[Python] How to draw a histogram in Matplotlib
How to create a Rest Api in Django
How to plot autocorrelation and partial autocorrelation in python
[Python] How to save the installed package and install it in a new environment at once Mac environment
Create a Python image in Django without a dummy image file and test the image upload
[Python] How to name table data and output it in csv (to_csv method)
[Python] How to create a local web server environment with SimpleHTTPServer and CGIHTTPServer
[BigQuery] How to use BigQuery API for Python -Table creation-
[Python] How to expand variables in a character string
Create a plugin to run Python Doctest in Vim (2)
Create a plugin to run Python Doctest in Vim (1)
How to display DataFrame as a table in Markdown
[Python] How to sort dict in list and instance in list
How to create an image uploader in Bottle (Python)
[Python] How to create Correlation Matrix and Heat Map
How to save a table scraped by python to csv
[Python] How to create a 2D histogram with Matplotlib
How to execute a command using subprocess in Python
How to create a kubernetes pod from python code
Publish / upload a library created in Python to PyPI
How to create a Python 3.6.0 environment by putting pyenv on Amazon Linux and Ubuntu
Create a function in Python
Create a dictionary in Python
How to develop in Python
How to build a new python virtual environment on Ubuntu
How to slice a block multiple array from a multiple array in Python
How to swap elements in an array in Python, and how to reverse an array.
A story about how to specify a relative path in python.
How to import a file anywhere you like in Python
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
How to define multiple variables in a python for statement
[Python Kivy] How to create a simple pop up window
A standard way to develop and distribute packages in Python
I tried "How to get a method decorated in Python"
How to create and use static / dynamic libraries in C
Manipulate the clipboard in Python and paste the table into Excel
How to develop in a virtual environment of Python [Memo]