Previously, I posted How to batch insert from CSV file to Tableau hyper file with PostgreSQL-like COPY command, but this time it is an existing hyper file. Learn how to update the data source for your twbx file (Tableau Packaged Workbook).

Normally, if you want to update the data source of the twbx file, you need to open Tableau Desktop and operate the GUI. https://help.tableau.com/current/pro/desktop/ja-jp/save_savework_packagedworkbooks.htm

However, if the number of packaged workbooks and data sources to be updated increases, it will be very troublesome, so it is better to update with CUI like this time.

environment

Python 3.7.6
Tableau Desktop 2019.2.0

What is a packaged workbook (twbx file) in the first place?

Before we get into the main subject, let's talk about what a twbx file is. In conclusion, ** twbx files are zip-compressed twb files (Tableau workbooks) and data sources (hyper files) **. Let's actually check the contents of the twbx file. twbxの中を確認.png You can see that sample.twbx contains sample.twb and Data / sample / sample.hyper. So, conversely, ** zip the twb file (Tableau workbook) and data source (hyper file) with the proper directory structure to make a twbx file **! Below, we will implement it based on this idea.

Preparation

Directory structure

We will implement it in the following directory.

.
├── conf
│   └── update_conf.csv
├── input
│   ├── sales.csv
│   └── titanic.csv
├── output
│   ├── sales.hyper
│   └── titanic.hyper
├── src
│   └── update_twbx.py
├── twbx
│   └── sample.twbx
└── work

Place the twbx file to be updated in `.twbx /`

Place the twbx file (sample.twbx) to be updated in.twbx /. sample.twbx is created by referencing two data sources (hyper files). The data sources are Titanic (train.csv) and Predict Future Sales, which are familiar to Kaggle. c / competitive-data-science-predict-future-sales) (sales_train.csv). The file names have been renamed below for clarity.

train.csv → titanic.csv
sales_train.csv → sales.csv

Place the data source (hyper) you want to update in `.output/`

Place the data source (hyper file) you want to update in .output /. Please note the following regarding the files to be placed.

--It must be the same as the hyper file referenced by the existing twbx --All the hyper files referenced by the existing twbx should be placed.

Describe the update settings in `.conf/update_conf.csv`

Describe the following in the configuration file (.conf/update_conf.csv).

--The directory where the data source (hyper) to be updated exists --Packaged file path to be updated (twbx) --Updated packaged file path (twbx)

`csv:.conf/update_conf.csv`


datasource,update_target_twbx,create_target_twbx
./output,./twbx/sample.twbx,./twbx/output.twbx

Script implementation

After preparation, implement the Python script. Implement the following processing flow.

Zip the twbx file to be updated to ./work
Replace the data source with the update target
Zip twb files and data sources

`python:.src/update_twbx.py`


import csv
import zipfile
import os
import glob
import re
import shutil


work_dir = './work'
twbx_dir = './twbx'

#Get the twbx file path to be updated
with open('./conf/update_conf.csv') as f:
    reader = csv.reader(f)
    next(reader)
    line_list = [row for row in reader]

for replace_dir, update_from, update_to in line_list:
    #Unzip the twbx to be updated to work dir
    with zipfile.ZipFile(update_from) as existing_zip:
        existing_zip.extractall(work_dir)
    #Get the twb filename in twbx
    for file_name in os.listdir(work_dir):
        if 'twb' in file_name:
            twb_file_name = file_name
            twb_file_path = os.path.join(work_dir, twb_file_name)
    #hyper file storage dir in twbx
    data_dir_regexp = os.path.join(work_dir, 'Data/*/*.hyper')
    #Get the hyper file list in twbx
    data_file_list = glob.glob(data_dir_regexp)
    
    data_file_dict = {}
    for data_file_path in data_file_list:
        data_file_dict[re.sub(r'^.*/', '', data_file_path)] = data_file_path
    replace_file_dict = {}
    for replace_file_path in glob.glob(f'{replace_dir}*.hyper'):
        replace_file_dict[re.sub(r'^.*/', '', replace_file_path)] = replace_file_path
    #Updated data source included in twbx
    for key in data_file_dict.keys():
        shutil.copy2(data_file_dict[key], replace_file_dict[key])
    
    with zipfile.ZipFile(update_to, 'w', compression=zipfile.ZIP_DEFLATED) as new_zip:
        new_zip.write(twb_file_path, arcname=twb_file_name)
        for data_file_path in data_file_list:
            data_file_name = re.sub(r'^.*/', '', data_file_path)
            new_zip.write(data_file_path, arcname=data_file_name)

Run

You can update the packaged workbook by doing the following:

python src/update_twbx.py

After execution, I was able to confirm that the updated twbx file was created in the location specified by ʻupdate_twbx_list.csv`!

Finally

This time, I followed the directory structure created by zipping the twbx file as it is, but you can place the data source in any directory by editing the datasource tag of the twb file after decompression. Editing the twb file is introduced in How to rewrite the DB connection information of Tableau Desktop (twb file) with Python.

How to update a Tableau packaged workbook data source using Python

environment

What is a packaged workbook (twbx file) in the first place?

Preparation

Directory structure

Place the twbx file to be updated in .twbx /

Place the data source (hyper) you want to update in .output/

Describe the update settings in .conf/update_conf.csv

csv:.conf/update_conf.csv