[JAVA] SELECT data using client library in BigQuery

In Google BigQuery, there are two ways to get data from a table:

In this article, I will introduce "How to get using the client library".

environment

Environment

See Install BigQuery client libraries (https://cloud.google.com/bigquery/docs/reference/libraries?hl=ja). As of January 2018, there is only a beta version of the java client library, so we will use that in this article as well. The source of the client library is published below. https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-bigquery

Table structure

In this article, we will SELECT on a table with the following configuration.

COLUMN TYPE MODE
name STRING NULLABLE
gender STRING NULLABLE
count INTEGER NULLABLE

Use listTableData method

Use BigQuery # listTableData.

BigQuerySelectSample.java


package bigquery.sample;

import com.google.api.gax.paging.Page;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;

public class BigQuerySelectSample {

	private static final String DATASET_NAME = "dev1";
	private static final String TABLE_NAME = "names_2014";
	private static final int NAME_COLUMN = 0;
	private static final int GENDER_COLUMN = 1;

	public static void main(String[] args) {

		BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

		// use listTableData method
		Page<FieldValueList> tableData = bigquery.listTableData(DATASET_NAME, TABLE_NAME);
		for (FieldValueList row : tableData.iterateAll()) {
			System.out.println(row.get(NAME_COLUMN).getValue() + " " + row.get(GENDER_COLUMN).getValue());
		}
	}
}

The listTableData method does not return schema information (column names, etc.), only data. Therefore, when you need the data of a specific column, you need to access it by the index of row like row.get (0), row.get (1).

Use query method

In addition to BigQuery # listTableData, you can also use BigQuery # query.

BigQuerySelectSample.java


package bigquery.sample;

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.QueryResponse;

public class BigQuerySelectSample {

	private static final String DATASET_NAME = "dev1";
	private static final String TABLE_NAME = "names_2014";

	public static void main(String[] args) throws InterruptedException {

		BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

		// use query method
		String query = "SELECT name, gender FROM `" + DATASET_NAME + "." + TABLE_NAME + "` WHERE name = 'Ryder'";
		QueryJobConfiguration queryConfig = QueryJobConfiguration.of(query);
		QueryResponse response = bigquery.query(queryConfig);
		for (FieldValueList row : response.getResult().iterateAll()) {
			System.out.println(row.get("name").getValue() + " " + row.get("gender").getValue());
		}
	}
}

You can write SQL in BigQuery in the same format as RDB. The detailed query reference is BigQuery SELECT clause. Since the query method also returns schema information (column name, etc.), it is possible to access by column name like row.get ("columnName").

performance

It's a simple measurement, but it took about 4 to 5 seconds for both BigQuery # listTableData and BigQuery # query for 33206 tables. It seems that there is not much difference between the two.

that's all.

reference

Recommended Posts

SELECT data using client library in BigQuery
Select features using text data
Implement OAuth without using client library (Java)
Data supply tricks using deques in machine learning
Sort BigQuery tables according to data in Dataflow
Define boto3 Client API response in data class
Read files on GCS using Cloud Storage Client Library
Create a data collection bot in Python using Selenium
Notes on transactions in the Java client library in the datastore
Collectively register data in Firestore using csv file in Python
Get LEAD data using Marketo's REST API in Python
Using the National Diet Library Search API in Python
Data analysis using xarray
Sampling in imbalanced data
Data analysis using Python 0
Data cleaning using Python
Published a library that hides character data in Python images
Inflating text data by retranslation using google translate in Python
Sine wave prediction using RNN in deep learning library Keras
Output Excel data in separate writing using Python3 + xlrd + mecab
Try building a neural network in Python without using a library
Graph time series data in Python using pandas and matplotlib
Problems when using Elasticsearch as a data source in Redash
Play with YouTube Data API v3 using Google API Python Client