Previously, I posted How to batch insert from CSV file to Tableau hyper file with PostgreSQL-like COPY command, but this time it is an existing hyper file. Learn how to update the data source for your twbx file (Tableau Packaged Workbook).
Normally, if you want to update the data source of the twbx file, you need to open Tableau Desktop and operate the GUI. https://help.tableau.com/current/pro/desktop/ja-jp/save_savework_packagedworkbooks.htm
However, if the number of packaged workbooks and data sources to be updated increases, it will be very troublesome, so it is better to update with CUI like this time.
Before we get into the main subject, let's talk about what a twbx file is.
In conclusion, ** twbx files are zip-compressed twb files (Tableau workbooks) and data sources (hyper files) **.
Let's actually check the contents of the twbx file.
You can see that sample.twbx
contains sample.twb
and Data / sample / sample.hyper
.
So, conversely, ** zip the twb file (Tableau workbook) and data source (hyper file) with the proper directory structure to make a twbx file **!
Below, we will implement it based on this idea.
We will implement it in the following directory.
.
├── conf
│ └── update_conf.csv
├── input
│ ├── sales.csv
│ └── titanic.csv
├── output
│ ├── sales.hyper
│ └── titanic.hyper
├── src
│ └── update_twbx.py
├── twbx
│ └── sample.twbx
└── work
.twbx /
Place the twbx file (sample.twbx
) to be updated in.twbx /
.
sample.twbx
is created by referencing two data sources (hyper files).
The data sources are Titanic (train.csv) and Predict Future Sales, which are familiar to Kaggle. c / competitive-data-science-predict-future-sales) (sales_train.csv).
The file names have been renamed below for clarity.
.output/
Place the data source (hyper file) you want to update in .output /
.
Please note the following regarding the files to be placed.
--It must be the same as the hyper file referenced by the existing twbx --All the hyper files referenced by the existing twbx should be placed.
.conf/update_conf.csv
Describe the following in the configuration file (.conf/update_conf.csv
).
--The directory where the data source (hyper) to be updated exists --Packaged file path to be updated (twbx) --Updated packaged file path (twbx)
csv:.conf/update_conf.csv
datasource,update_target_twbx,create_target_twbx
./output,./twbx/sample.twbx,./twbx/output.twbx
After preparation, implement the Python script. Implement the following processing flow.
./work
python:.src/update_twbx.py
import csv
import zipfile
import os
import glob
import re
import shutil
work_dir = './work'
twbx_dir = './twbx'
#Get the twbx file path to be updated
with open('./conf/update_conf.csv') as f:
reader = csv.reader(f)
next(reader)
line_list = [row for row in reader]
for replace_dir, update_from, update_to in line_list:
#Unzip the twbx to be updated to work dir
with zipfile.ZipFile(update_from) as existing_zip:
existing_zip.extractall(work_dir)
#Get the twb filename in twbx
for file_name in os.listdir(work_dir):
if 'twb' in file_name:
twb_file_name = file_name
twb_file_path = os.path.join(work_dir, twb_file_name)
#hyper file storage dir in twbx
data_dir_regexp = os.path.join(work_dir, 'Data/*/*.hyper')
#Get the hyper file list in twbx
data_file_list = glob.glob(data_dir_regexp)
data_file_dict = {}
for data_file_path in data_file_list:
data_file_dict[re.sub(r'^.*/', '', data_file_path)] = data_file_path
replace_file_dict = {}
for replace_file_path in glob.glob(f'{replace_dir}*.hyper'):
replace_file_dict[re.sub(r'^.*/', '', replace_file_path)] = replace_file_path
#Updated data source included in twbx
for key in data_file_dict.keys():
shutil.copy2(data_file_dict[key], replace_file_dict[key])
with zipfile.ZipFile(update_to, 'w', compression=zipfile.ZIP_DEFLATED) as new_zip:
new_zip.write(twb_file_path, arcname=twb_file_name)
for data_file_path in data_file_list:
data_file_name = re.sub(r'^.*/', '', data_file_path)
new_zip.write(data_file_path, arcname=data_file_name)
You can update the packaged workbook by doing the following:
python src/update_twbx.py
After execution, I was able to confirm that the updated twbx file was created in the location specified by ʻupdate_twbx_list.csv`!
This time, I followed the directory structure created by zipping the twbx file as it is, but you can place the data source in any directory by editing the datasource tag of the twb file after decompression. Editing the twb file is introduced in How to rewrite the DB connection information of Tableau Desktop (twb file) with Python.
Recommended Posts