In Google BigQuery, there are two ways to get data from a table:
In this article, I will introduce "How to get using the client library".
See Install BigQuery client libraries (https://cloud.google.com/bigquery/docs/reference/libraries?hl=ja). As of January 2018, there is only a beta version of the java client library, so we will use that in this article as well. The source of the client library is published below. https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-bigquery
In this article, we will SELECT on a table with the following configuration.
COLUMN | TYPE | MODE |
---|---|---|
name | STRING | NULLABLE |
gender | STRING | NULLABLE |
count | INTEGER | NULLABLE |
Use BigQuery # listTableData.
BigQuerySelectSample.java
package bigquery.sample;
import com.google.api.gax.paging.Page;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
public class BigQuerySelectSample {
private static final String DATASET_NAME = "dev1";
private static final String TABLE_NAME = "names_2014";
private static final int NAME_COLUMN = 0;
private static final int GENDER_COLUMN = 1;
public static void main(String[] args) {
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
// use listTableData method
Page<FieldValueList> tableData = bigquery.listTableData(DATASET_NAME, TABLE_NAME);
for (FieldValueList row : tableData.iterateAll()) {
System.out.println(row.get(NAME_COLUMN).getValue() + " " + row.get(GENDER_COLUMN).getValue());
}
}
}
The listTableData method does not return schema information (column names, etc.), only data. Therefore, when you need the data of a specific column, you need to access it by the index of row like row.get (0), row.get (1).
In addition to BigQuery # listTableData, you can also use BigQuery # query.
BigQuerySelectSample.java
package bigquery.sample;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FieldValueList;
import com.google.cloud.bigquery.QueryJobConfiguration;
import com.google.cloud.bigquery.QueryResponse;
public class BigQuerySelectSample {
private static final String DATASET_NAME = "dev1";
private static final String TABLE_NAME = "names_2014";
public static void main(String[] args) throws InterruptedException {
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
// use query method
String query = "SELECT name, gender FROM `" + DATASET_NAME + "." + TABLE_NAME + "` WHERE name = 'Ryder'";
QueryJobConfiguration queryConfig = QueryJobConfiguration.of(query);
QueryResponse response = bigquery.query(queryConfig);
for (FieldValueList row : response.getResult().iterateAll()) {
System.out.println(row.get("name").getValue() + " " + row.get("gender").getValue());
}
}
}
You can write SQL in BigQuery in the same format as RDB. The detailed query reference is BigQuery SELECT clause. Since the query method also returns schema information (column name, etc.), it is possible to access by column name like row.get ("columnName").
It's a simple measurement, but it took about 4 to 5 seconds for both BigQuery # listTableData and BigQuery # query for 33206 tables. It seems that there is not much difference between the two.
that's all.
Recommended Posts