BigQuery Scripting was released on Beta on October 3rd of last year (2019). If you use it well, you can replace Python processing, right? I was enthusiastic, but I (+ team) who is not so good at programming do not know how to use it.
You are actually using "[BigQuery Scripting has been released Beta, so let's walk through lightly](https://medium.com/google-cloud-jp/bigquery-scripting%E3%81%8Cbeta%E3%83" % AA% E3% 83% AA% E3% 83% BC% E3% 82% B9% E3% 81% 95% E3% 82% 8C% E3% 81% 9F% E3% 81% AE% E3% 81% A7 % E8% BB% BD% E3% 81% 8F% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 82% AF% E3% 82% B9% E3% 83% AB% E3 % 83% BC% E3% 81% 97% E3% 81% A6% E3% 81% BF% E3% 82% 8B-1408bab2c026) ", but it is difficult to use.
After a few months of such a pain, I finally got a Python-like loop process, so I will share it.
As you know, BigQuery can execute SQL processing, but there are times when you want to conditional branch or loop the SQL to be executed by an external variable. Normally, it is called in another language such as Python to branch and repeat SQL processing, but Scripting is to incorporate it into BigQuery to that extent. (Perhaps)
The most understandable processing that can be used is "Standard SQL Scripting", but the following processing is possible. There is. (Representative example)
--DECLARE: Variable declaration --SET: Assigning a value to a variable --IF ~ ELSE ~ END IF: Conditional branch --LOOP / WHILE: Loop processing
For example, if you want to turn SQL for each store and put out the total sales in different tables, you can call BigQuery from Python as follows.
for store in stores:
query=f"""
SELECT
store, SUM(Purchase price)AS total amount
FROM `myproject.mydataset.transaction_*`
WHERE store = {store_cd}
GROUP BY store
"""
BigQuery Scripting also has Loop processing, so it can be done easily like this. I was thinking. However, "[BigQuery Scripting has been released Beta, so let's walk through lightly](https://medium.com/google-cloud-jp/bigquery-scripting%E3%81%8Cbeta%E3%83%AA%" E3% 83% AA% E3% 83% BC% E3% 82% B9% E3% 81% 95% E3% 82% 8C% E3% 81% 9F% E3% 81% AE% E3% 81% A7% E8% BB% BD% E3% 81% 8F% E3% 82% A6% E3% 82% A9% E3% 83% BC% E3% 82% AF% E3% 82% B9% E3% 83% AB% E3% 83% BC% E3% 81% 97% E3% 81% A6% E3% 81% BF% E3% 82% 8B-1408bab2c026) " You can't do array iterations like you would in a Python loop.
I wanted to do it somehow.
The next process was finally made through various trials and errors to see if something could be done.
#Variable declaration
DECLARE stores ARRAY<STRING>; #Make an array called stores and run the loop
DECLARE x INT64 DEFAULT 1; #Arguments used in stores (default value = 1)
#Assignment to a variable
#I wanted to automatically create an ARRAY value (store list), so ARRAY_Created with AGG
SET stores = (
SELECT ARRAY_AGG(store_cd) as list
FROM (SELECT store_cd FROM `myproject.mydataset.mytable` GROUP BY store_cd ORDER BY store_cd)
);
#Loop processing
#Repeat up to the length of stores
WHILE x <= array_length(stores) DO
SELECT store_cd, SUM(Purchase price)AS total amount
FROM `myproject.mydataset.mytable`
WHERE store_cd=stores [ORDINAL(x)] #Get the Xth value of stores
GROUP BY store_cd;
SET x = x + 1;
END WHILE;
Yup. Somehow, I was able to do Loop processing that seems to be used in old JAVA.
For the time being, I was able to loop processing only during BigQuery. In the first place, I wrote without using ARRAY so much, so it was a pain to use that process. (If you can, it seems easy)
In the future, I would like to take on the challenge of conditional branching (IF statement).
Recommended Posts