Read / write between BigQuery ↔ Pandas

Introduction

BigQuery client library google-cloud-bigquery I will write how to use the wrapper pandas-gbq.

environment

GCP account registration

[Explanation with image] Register an account with a free trial of Google Cloud Platform (GCP)

Install Google Cloud SDK

Install Google Cloud SDK ~ Initialize

Project creation

Create a project with Google Cloud SDK

Service account / service account key creation

Create a service account / service account key so that you can access GCS from Python.

Service account creation

gcloud iam service-accounts create service account name\
                                   --display-name service account display name\
スクリーンショット 2020-02-10 0.10.39.png

Check the permissions currently associated with the project

gcloud projects get-iam-policy mypj-id

# bindings:
# - members:
#   - user:[email protected]
#   role: roles/owner
# etag: BwWeTrntoao=
# version: 1

Grant permissions to service accounts

Grant BigQuery administrator privileges

gcloud projects add-iam-policy-binding project ID\
--member serviceAccount:Service account name@Project ID.iam.gserviceaccount.com \
--role roles/bigquery.admin
スクリーンショット 2020-02-10 0.16.18.png

role list

https://cloud.google.com/iam/docs/understanding-roles?hl=ja#predefined_roles

Check permissions again

gcloud projects get-iam-policy mypj-id

# bindings:
# - members:
#   - user:[email protected]
#   role: roles/owner
# - members:
#   - serviceAccount:[email protected]
#   role: roles/bigquery.admin
# etag: BwWeTz6vIBY=
# version: 1

Service account key creation

$ gcloud iam service-accounts keys create ./anata_no_key.json \
--iam-account service account name@Project ID.iam.gserviceaccount.com

Directory structure

.
├── anata_no_key.json
└── working/
        └── main.py

When using google-cloud-bigquery

Install google-cloud-bigquery with pip

$ pip install google-cloud-bigquery

Read the query result with DataFrame

main.py


import os

from google.cloud import bigquery
from IPython.display import display


#Get the directory of your own file
cwd = os.path.dirname(os.path.abspath(__file__))
#Set in environment variable
key_path = '{}/../credentials.json'.format(cwd)
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = key_path

project_id = 'project_id'
dataset_id = 'dataset_id'
table_id = 'table_id'

client = bigquery.Client(project=project_id)

sql = f"""
select col1, col2, col3
from {dataset_id}.{table_id}
limit 100
"""

df = client.query(sql).to_dataframe()
display(df)

When using pandas-gbq

Install pandas-gbq with pip

$ pip install pandas-gbq

Read the query result with DataFrame

import pandas as pd


#Load from BigQuery in DataFrame format
project_id = 'project_id'
dataset_id = 'dataset_id'
table_id = 'table_id'
query = f"""
SELECT *
FROM {dataset_id}.{table_id}
LIMIT 10
"""

# dialect='standard'Use standard SQL in
df = pd.read_gbq(query, project_id, dialect='standard')
display(df)

Write a DataFrame to a BigQuery table

import pandas as pd


#Import DataFrame into BigQuery
df = pd.read_csv('path/to/dir/file.csv')
dataset_id = 'dataset_id'
table_id = 'table_id'
df.to_gbq(f'{dataset_id}.{table_id}')

Recommended Posts

Read / write between BigQuery ↔ Pandas
Read pandas data
Read CSV file: pandas
Read and write csv file
Read csv with python pandas
Correspondence between pandas and SQL
Read and write a file
Write and read a file
How to write soberly in pandas
Ignore # line and read in pandas
Differences between Python, read (), readline (), readlines ()