You often want to use BigQuery data from your local Jupyter Notebook for less resource-intensive analytics. Therefore, I will introduce a method for writing a BigQuery query in a local Jupyter Notebook and storing it in a DataFrame as it is. ・ MacOS Mojave ・ Python 3.7.3
Create an appropriate directory (here / Users / {username} / BigQueryLocal
), create a virtual environment called EVN3 with virtualenv, and enable ENV3.
$ mkdir BigQueryLocal
$ cd BigQueryLocal
$ virtualenv -p python3.7 ENV3
$ source ENV3/bin/activate
First, click the URL below. https://cloud.google.com/docs/authentication/getting-started?hl=ja
When you click the above URL, the following screen will appear. Click Go to [Create Service Account Key] page
.
Go to Create Service Account Key
, set the service account to ʻApp Engine default service account, set the key type to
JSON`, and press Create to download the JSON file.
Place the downloaded JSON account key {xxxxxxxxxx} .json
directly under / Users / {username} / BigQueryLocal
and execute the following.
$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"
It's not necessary, but for better visibility, create and move a directory called TestProject
.
$ mkdir TestProject
$ cd TestProject
Install the following minimum required packages with pip.
$ pip install google-cloud-bigquery
$ pip install jupyter
$ pip install pandas
Start jupyter notebook with the following command.
$ jupyter notebook
After starting, create an appropriate notebook, import the following modules and authenticate the client.
[1]
from google.cloud import bigquery
client = bigquery.Client()
Here, the data of bigquery-public-data.samples.natality
prepared by default is used.
By executing the following, BigQuery data can be handled by DataFrame.
[2]
sql = """
SELECT
plurality,
COUNT(1) AS count,
year
FROM
`bigquery-public-data.samples.natality`
WHERE
NOT IS_NAN(plurality) AND plurality > 1
GROUP BY
plurality, year
ORDER BY
count DESC
"""
df = client.query(sql).to_dataframe()
df.head()
Actually, the output is as follows, and it can be confirmed that it is output as a DataFrame.
From the next time onward, you only have to do the following:
$ cd BigQueryLocal
$ source ENV3/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"
$ cd TestProject
$ jupyter notebook
See below for environment construction using virtualenv. https://cloud.google.com/python/setup?hl=ja
See below for how to use BigQuery with Jupyter Notebook. https://cloud.google.com/bigquery/docs/visualize-jupyter?hl=ja
Recommended Posts