Introduction

You often want to use BigQuery data from your local Jupyter Notebook for less resource-intensive analytics. Therefore, I will introduce a method for writing a BigQuery query in a local Jupyter Notebook and storing it in a DataFrame as it is. ・ MacOS Mojave ・ Python 3.7.3

Create a virtual environment for Python 3

Create an appropriate directory (here / Users / {username} / BigQueryLocal), create a virtual environment called EVN3 with virtualenv, and enable ENV3.

$ mkdir BigQueryLocal
$ cd BigQueryLocal
$ virtualenv -p python3.7 ENV3
$ source ENV3/bin/activate

gcloud authentication

First, click the URL below. https://cloud.google.com/docs/authentication/getting-started?hl=ja

When you click the above URL, the following screen will appear. Click Go to [Create Service Account Key] page. スクリーンショット 2019-11-19 20.45.05.png

Go to Create Service Account Key, set the service account to ʻApp Engine default service account, set the key type to JSON`, and press Create to download the JSON file.

Place the downloaded JSON account key {xxxxxxxxxx} .json directly under / Users / {username} / BigQueryLocal and execute the following.

$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"

Create a Project directory

It's not necessary, but for better visibility, create and move a directory called TestProject.

$ mkdir TestProject
$ cd TestProject

Install the required packages

Install the following minimum required packages with pip.

$ pip install google-cloud-bigquery
$ pip install jupyter
$ pip install pandas

Launch Jupyter Notebook

Start jupyter notebook with the following command.

$ jupyter notebook

After starting, create an appropriate notebook, import the following modules and authenticate the client.

[1]
from google.cloud import bigquery
client = bigquery.Client()

Here, the data of bigquery-public-data.samples.natality prepared by default is used. By executing the following, BigQuery data can be handled by DataFrame.

[2]
sql = """
SELECT
    plurality,
    COUNT(1) AS count,
    year
FROM
    `bigquery-public-data.samples.natality`
WHERE
    NOT IS_NAN(plurality) AND plurality > 1
GROUP BY
    plurality, year
ORDER BY
    count DESC
"""
df = client.query(sql).to_dataframe()
df.head()

Actually, the output is as follows, and it can be confirmed that it is output as a DataFrame.

From the next time onward, you only have to do the following:

$ cd BigQueryLocal
$ source ENV3/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"
$ cd TestProject
$ jupyter notebook

reference

See below for environment construction using virtualenv. https://cloud.google.com/python/setup?hl=ja

See below for how to use BigQuery with Jupyter Notebook. https://cloud.google.com/bigquery/docs/visualize-jupyter?hl=ja

Use BigQuery from your local Jupyter Notebook