I think there are various ways to handle it from Python, but I personally use BigQuery-Python, so I'll make a brief note.
$ pip install bigquery-python
https://github.com/tylertreat/BigQuery-Python/blob/master/README.md
Get the following information
--Service account --Key file (p12 or pem)
from bigquery import get_client
PROJECT_ID = 'project-999'
SERVICE_ACCOUNT = '[email protected]'
PRIVATE_KEY_PATH = '/path.to/keyfile.p12'
with open(PRIVATE_KEY_PATH, 'rb') as f:
private_key = f.read()
client = get_client(PROJECT_ID,
private_key=private_key,
service_account= SERVICE_ACCOUNT,
readonly=False)
DATASET = 'spam'
if not client.check_dataset(DATASET):
client.create_dataset(DATASET)
schema.json
[
{
"name": "id",
"type": "INTEGER"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "time",
"type": "TIMESTAMP"
}
]
DATASET = 'spam'
TABLE_NAME = 'egg'
SCHEMA_PATH = '/path.to/schema.json'
with open(SCHEMA_PATH, 'r') as f:
table_schema = json.load(f)
if not client.check_dataset(DATASET):
raise
if not client.check_table(DATASET, TABLE_NAME):
client.create_table(DATASET, TABLE_NAME, table_schema)
(It is not necessary to set the schema to json, but it seems that it can be reused, so this time it is set to json.)
//spam/egg.csv
"id", "name", "time"
1, "S", "2015-05-18 00:00:00"
2, "Y", "2015-11-02 00:00:00"
from bigquery import JOB_SOURCE_FORMAT_CSV
KEY_NAME = 'gs://spam/egg.csv'
TABLE_NAME = 'egg'
client.import_data_from_uris(
TABLE_NAME,
KEY_NAME,
field_delimiter='\t', #Tab delimited
source_format=JOB_SOURCE_FORMAT_CSV, # csv
skip_leading_rows=1) #Skip the first line of csv
Specify timeout
to get it synchronously. If it takes longer than that, a BigQueryTimeoutException
will occur.
from bigquery.errors import BigQueryTimeoutException
query = 'SELECT id, name, time FROM [spam.egg] LIMIT 10'
try:
job_id, results = client.query(query, timeout=60)
except BigQueryTimeoutException as e:
print e
# job_id: u'job_xxxx_xxxxx'
# results: [
# {u'id': 1, u'name': 'S', u'time': "2015-11-02 00:00:00", },
# {u'id': 2, u'name': 'Y', u'time': "2015-11-02 00:00:00", },
# ]
If it seems to take time, you can consider getting it asynchronously. In that case, do not specify timeout
.
query = 'SELECT id, name, time FROM [spam.egg] LIMIT 10'
job_id, results = client.query(query)
# job_id: u'job_xxxx_xxxxx'
# results: []
Since job_id is issued, you can get the contents by putting job_id in the argument of get_query_rows
.
completed, _total_rows = client.check_job(job_id)
if completed:
results = client.get_query_rows(job_id)
# results: [
# {u'id': 1, u'name': 'S', u'time': "2015-11-02 00:00:00", },
# {u'id': 2, u'name': 'Y', u'time': "2015-11-02 00:00:00", },
# ]
In addition, you can see that you can do various things by looking inside the BigQuery Client.
Query Builder
There is also a mechanism that makes a nice query by calling render_query
. If I used it, it might have been a bad way to find it, but I couldn't handle BigQuery-specific functions such as adding LIMIT
, table decorator, and JSON_EXTRACT
, so I had to write it myself. It seems to be a bad idea.
https://github.com/tylertreat/BigQuery-Python#query-builder
Since the base is Google's API, Google's API documentation may be useful depending on the situation. https://cloud.google.com/bigquery/docs/reference/v2/
If you handle the key file by the method described here, an error may occur in the handling related to encryption in the Linux environment. The Python libraries that depend on BigQuery-Python are installed at the same time, but in a Linux environment it may not be installed due to lack of dependent tools. One of them is a library called cryptography
.
BigQuery-I'm trying to catch ʻImportErrorinside Python and use another library, but if that doesn't work, I get an error when running the script, which is annoying. The installation of
cryptography` itself can be installed below.
$ pip install cryptography
However, you have to install the dependent tools with yum etc. for that. As detailed in the documentation, I did the following in my environment:
$ sudo yum install gcc libffi-devel python-devel openssl-devel
Cryptography document http://cryptography.readthedocs.org/en/latest/installation/
Recommended Posts