A table (view) dependency diagram like this ...
↓ It is a story that I wrote Python code to output from a SQL group like this. (PlantUML was used to create the figure)
CREATE TABLE
`project.dataset.table5` AS
SELECT
*
FROM
`project.dataset.table1`;
CREATE TABLE IF NOT EXISTS
`project.dataset.table6` AS
SELECT
*
FROM
`project.dataset.table2`
UNION ALL
SELECT
*
FROM
`project.dataset.table3`;
CREATE VIEW
`project.dataset.table7` AS
SELECT
*
FROM
`project.dataset.table3`
INNER JOIN
`project.dataset.table4`
USING
(user_id);
According to wikipedia
PlantUML is a text-based language for creating open source UML diagrams
That's right. Various model diagrams can be created. Here are the sites that I used to learn from.
--Learn basic syntax and render diagrams - https://plantuml.com/ja/ --A personally developed app that was very easy to use - http://sujoyu.github.io/plantuml-previewer/
Do the following in Python3:
sql1 = '''CREATE TABLE
`project.dataset.table5` AS
SELECT
*
FROM
`project.dataset.table1`;
'''
sql2 = '''CREATE TABLE IF NOT EXISTS
`project.dataset.table6` AS
SELECT
*
FROM
`project.dataset.table2`
UNION ALL
SELECT
*
FROM
`project.dataset.table3`;
'''
sql3 = '''CREATE VIEW
`project.dataset.table7` AS
SELECT
*
FROM
`project.dataset.table3`
INNER JOIN
`project.dataset.table4`
USING
(user_id);
'''
import re
prog_destination = re.compile(r'(?:CREATE TABLE|CREATE TABLE IF NOT EXISTS|CREATE VIEW|CREATE VIEW IF NOT EXISTS|INSERT INTO|INSERT)[\s \n]+`(.+?)`')
prog_origin = re.compile(r'(?:FROM|JOIN)[\s \n]+`(.+?)`')
platuml_tempate = '''@startuml
skinparam padding 10 /'Padding adjustment'/
left to right direction /'If you want a layout that extends the diagram from left to right'/
hide members /'Erase class attributes'/
hide circle /'Remove the class mark'/
{}
@enduml
'''
def make_table_dependencies_for_platuml(sql:str):
#Table to create(View)Get
if len(prog_destination.findall(sql)) != 1:
raise Exception('CREATE TABLE|The INSERT INTO clause does not exist.')
else:
destination_table = prog_destination.findall(sql)[0]
#Get referrer table excluding self-reference
origin_tables = [table for table in prog_origin.findall(sql) if table != destination_table]
if len(origin_tables) == 0:
raise Exception('Dependency does not exist')
return [F'"{table}" <|-- "{destination_table}"' for table in origin_tables]
#When you want to output PlantUML for each sql
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql1))))
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql2))))
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql3))))
all_sql_dependencies = [make_table_dependencies_for_platuml(sql) for sql in [sql1, sql2, sql3]]
print(platuml_tempate.format('\n'.join(sum(all_sql_dependencies, []))))
Then, the following result will be output. Let's render the figure at the link destination introduced earlier. (The relationship diagram that appears at the beginning is drawn.)
@startuml
skinparam padding 10 /'Padding adjustment'/
left to right direction /'If you want a layout that extends the diagram from left to right'/
hide members /'Erase class attributes'/
hide circle /'Remove the class mark'/
"project.dataset.table1" <|-- "project.dataset.table5"
"project.dataset.table2" <|-- "project.dataset.table6"
"project.dataset.table3" <|-- "project.dataset.table6"
"project.dataset.table3" <|-- "project.dataset.table7"
"project.dataset.table4" <|-- "project.dataset.table7"
@enduml
I'm doing data analysis work, but I often process a table with BigQuery to create another table, and then process that table to create a table. Especially with ad hoc analysis, it becomes difficult to understand the relationships between tables. I think that it will be useful for understanding table relations and whether you are making useless intermediate tables.
↓ Inflated table dependencies
--The idea of grasping the dependency between tables from SQL and regular expressions were obtained from the following article. As expected it is ZOZO: pray: -Introduction of data mart construction platform managed by GitHub --ZOZO Technologies TECH BLOG
--You can also use GraphViz & PyDot.
We are challenging the Advent calendar this year as well, so please have a look. https://qiita.com/advent-calendar/2019/sensy