You can read more about how to work with BigQuery in Python at here. Managing Datasets if you want to work with datasets Managing Tables if you want to manipulate tables There is a description in.
Creating a table
from google.cloud import bigquery
#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"
schema = [
bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]
# table_id = "your-project.your_dataset.your_table_name"
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
Details of the statement can be found at here. , For details on how to specify the schema, see BigQuery official website. Change field_type and mode as needed.
statement
SchemaField(name, field_type, mode='', description=None, fields=(), policy_tags=None)
An example of schema specification
#String(There is a space)
SchemaField(name, 'STRING', mode='NULLABLE')
#integer(There is a space)
SchemaField(name, 'INT64', mode='NULLABLE')
#Floating point(There is a space)
SchemaField(name, 'FLOAT64', mode='NULLABLE')
#date(Mandatory)
SchemaField(name, 'DATE', mode='REQUIRED')
#Date and time(Mandatory)
SchemaField(name, 'DATETIME', mode='REQUIRED')
Execute query
from google.cloud import bigquery
#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"
#Describe the query you want to execute
query = '''
select * from `tableID`
where ...
'''
client.query(query)
Please note that you can insert up to 10,000 lines at a time. If you want to enter more than 10,000 lines, you can use the method described below.
from google.cloud import bigquery
#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"
# table_id = "your-project.your_dataset.your_table_name"
table = client.get_table(table_id) # Make an API request.
#Two-dimensional array of lists
#In the example, it is a tuple, but there is no problem with the list(Below is the case with two columns)
#However, the number of schemas and the tuples inside(or list)Note that an error will occur if the number of elements in
rows_to_insert = [("string", num), ("string", num)]
errors = client.insert_rows(table, rows_to_insert) # Make an API request.
#When using pandas dataflame
errors = client.insert_rows(table, df.values.tolist())
if errors == []:
print("New rows have been added.")
Assuming that dataflame is included in df, it can be changed to the two-dimensional array of the list below.
df.values.tolist()
Also, if you want to insert data larger than 10,000 rows, divide it as follows.
rows = len(df)
count = 0
while rows > 0:
client.insert_rows(table, df[(count * 10000): ((count+1) * 10000)].values.tolist())
rows = rows - 10000
count = count + 1
There may be a better way to write it, but it is possible to add everything even if it is larger than 10000 above.
Recommended Posts