I tried using BigQuery ML

Introduction

In Basic machine learning procedure: (1) Classification model, data is imported from BigQuery into the Python environment and analyzed by scikit-learn.

However, recently, like BigQueryML, machine learning can be performed only within BigQuery. This time, I will try BigQuery ML.

Analytical environment

Google BigQuery Google Colaboratory

Referenced page

-Google Cloud launches "BigQuery ML" for machine learning with SQL statements

Target data

Similar to Previous, create result as the campaign response and product1 ~ as the purchase price of the product.

id result product1 product2 product3 product4 product5
001 1 2500 1200 1890 530 null
002 0 750 3300 null 1250 2000

1. Build a model

Until now, BigQuery had only TABLE and VIEW, but it can also be saved in the MODEL format. (There are other formats such as FUNCTION)

from google.cloud import bigquery

query=f"""CREATE OR REPLACE MODEL `myproject.mydataset.mymodel`
OPTIONS
  (model_type='logistic_reg', labels = ['result']) AS #Objective variable (expected variable)

#Predict using the following variables
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_training`
"""

job = client.query(query)
result = job.result()

The following three can be selected for model_type. (It seems that you can use the Tensorflow model, but I will omit it here.)

--logistic_reg: Logistic regression analysis (objective variable is categorical variable) --linear_reg: Linear regression analysis (objective variable is a numerical variable) --kmeans: Cluster analysis

This time, we use logistic_reg because it is whether or not to respond to the promotion.

2. Evaluate the model

Call the model created by ML.EVALUATE and validate it with test data.

query=f"""
SELECT
  roc_auc, precision, recall
FROM
  ML.EVALUATE(MODEL `myproject.mydataset.mymodel`,  ( #Call the created model

#Validate with different test data
SELECT result, product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable_test`
))
"""

job = client.query(query)
result = job.result()

The accuracy of test data is evaluated by Accuracy, Precision, and Recall.

3. Apply the model

Call the model created by ML.PREDICT and apply the model to the new data.

query=f"""
SELECT
*
FROM
  ML.PREDICT(MODEL `myproject.mydataset.mymodel`,  ( #Call the created model

#Apply the model to the new data
SELECT product1, product2, product3, product4, product5
FROM `myproject.mydataset.mytable`)
);
"""

#Project data set table name to output
project = "myproject"
client = bigquery.Client(project=project)
dataset = "mydataset"
ds = client.dataset(dataset)
table = "mytable_predict"

job_config = bigquery.QueryJobConfig()
job_config.destination = ds.table(table)
job = client.query(query, job_config=job_config)

result = job.result()

ML.EVALUATE when evaluating the model. To apply, just call each model created by ML.PREDICT. It's pretty easy to use.

in conclusion

The methods that can be used are still limited, but it is easier to use than when it was created with Basic machine learning procedure: ① Classification model. ..

On the other hand, if you can make it so easily, you will be wondering what to do when trying to improve the model. I wonder if it will improve depending on which variable is used.

Recommended Posts

I tried using BigQuery ML
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried using the BigQuery Storage API
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried using Amazon Glacier
I tried using git inspector
[Python] I tried using OpenPose
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried using Rotrics Dex Arm # 2
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried using Thonny (Python / IDE)
I tried server-client communication using tmux
I tried reinforcement learning using PyBrain
I tried deep learning using Theano
Somehow I tried using jupyter notebook
[Kaggle] I tried undersampling using imbalanced-learn
I tried shooting Kamehameha using OpenPose
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
I tried using Azure Speech to Text.
I tried using Twitter api and Line api
I tried scraping
I tried PyQ
I tried playing a ○ ✕ game using TensorFlow
I tried using YOUTUBE Data API V3
[Kaggle] I tried ensemble learning using LightGBM
I tried using PyEZ and JSNAPy. Part 2: I tried using PyEZ
I tried using Bayesian Optimization in Python
I tried to classify text using TensorFlow
I tried using Selective search as R-CNN