How to update a Tableau packaged workbook data source using Python

Previously, I posted How to batch insert from CSV file to Tableau hyper file with PostgreSQL-like COPY command, but this time it is an existing hyper file. Learn how to update the data source for your twbx file (Tableau Packaged Workbook).

Normally, if you want to update the data source of the twbx file, you need to open Tableau Desktop and operate the GUI. https://help.tableau.com/current/pro/desktop/ja-jp/save_savework_packagedworkbooks.htm

However, if the number of packaged workbooks and data sources to be updated increases, it will be very troublesome, so it is better to update with CUI like this time.

environment

What is a packaged workbook (twbx file) in the first place?

Before we get into the main subject, let's talk about what a twbx file is. In conclusion, ** twbx files are zip-compressed twb files (Tableau workbooks) and data sources (hyper files) **. Let's actually check the contents of the twbx file. twbxの中を確認.png You can see that sample.twbx contains sample.twb and Data / sample / sample.hyper. So, conversely, ** zip the twb file (Tableau workbook) and data source (hyper file) with the proper directory structure to make a twbx file **! Below, we will implement it based on this idea.

Preparation

Directory structure

We will implement it in the following directory.

.
├── conf
│   └── update_conf.csv
├── input
│   ├── sales.csv
│   └── titanic.csv
├── output
│   ├── sales.hyper
│   └── titanic.hyper
├── src
│   └── update_twbx.py
├── twbx
│   └── sample.twbx
└── work

Place the twbx file to be updated in .twbx /

Place the twbx file (sample.twbx) to be updated in.twbx /. sample.twbx is created by referencing two data sources (hyper files). The data sources are Titanic (train.csv) and Predict Future Sales, which are familiar to Kaggle. c / competitive-data-science-predict-future-sales) (sales_train.csv). The file names have been renamed below for clarity.

Place the data source (hyper) you want to update in .output/

Place the data source (hyper file) you want to update in .output /. Please note the following regarding the files to be placed.

--It must be the same as the hyper file referenced by the existing twbx --All the hyper files referenced by the existing twbx should be placed.

Describe the update settings in .conf/update_conf.csv

Describe the following in the configuration file (.conf/update_conf.csv).

--The directory where the data source (hyper) to be updated exists --Packaged file path to be updated (twbx) --Updated packaged file path (twbx)

csv:.conf/update_conf.csv


datasource,update_target_twbx,create_target_twbx
./output,./twbx/sample.twbx,./twbx/output.twbx

Script implementation

After preparation, implement the Python script. Implement the following processing flow.

  1. Zip the twbx file to be updated to ./work
  2. Replace the data source with the update target
  3. Zip twb files and data sources

python:.src/update_twbx.py


import csv
import zipfile
import os
import glob
import re
import shutil


work_dir = './work'
twbx_dir = './twbx'

#Get the twbx file path to be updated
with open('./conf/update_conf.csv') as f:
    reader = csv.reader(f)
    next(reader)
    line_list = [row for row in reader]

for replace_dir, update_from, update_to in line_list:
    #Unzip the twbx to be updated to work dir
    with zipfile.ZipFile(update_from) as existing_zip:
        existing_zip.extractall(work_dir)
    #Get the twb filename in twbx
    for file_name in os.listdir(work_dir):
        if 'twb' in file_name:
            twb_file_name = file_name
            twb_file_path = os.path.join(work_dir, twb_file_name)
    #hyper file storage dir in twbx
    data_dir_regexp = os.path.join(work_dir, 'Data/*/*.hyper')
    #Get the hyper file list in twbx
    data_file_list = glob.glob(data_dir_regexp)
    
    data_file_dict = {}
    for data_file_path in data_file_list:
        data_file_dict[re.sub(r'^.*/', '', data_file_path)] = data_file_path
    replace_file_dict = {}
    for replace_file_path in glob.glob(f'{replace_dir}*.hyper'):
        replace_file_dict[re.sub(r'^.*/', '', replace_file_path)] = replace_file_path
    #Updated data source included in twbx
    for key in data_file_dict.keys():
        shutil.copy2(data_file_dict[key], replace_file_dict[key])
    
    with zipfile.ZipFile(update_to, 'w', compression=zipfile.ZIP_DEFLATED) as new_zip:
        new_zip.write(twb_file_path, arcname=twb_file_name)
        for data_file_path in data_file_list:
            data_file_name = re.sub(r'^.*/', '', data_file_path)
            new_zip.write(data_file_path, arcname=data_file_name)    

Run

You can update the packaged workbook by doing the following:

python src/update_twbx.py

After execution, I was able to confirm that the updated twbx file was created in the location specified by ʻupdate_twbx_list.csv`!

Finally

This time, I followed the directory structure created by zipping the twbx file as it is, but you can place the data source in any directory by editing the datasource tag of the twb file after decompression. Editing the twb file is introduced in How to rewrite the DB connection information of Tableau Desktop (twb file) with Python.

Recommended Posts

How to update a Tableau packaged workbook data source using Python
How to set up a Python environment using pyenv
How to update Python Tkinter to 8.6
How to make a Python package using VS Code
How to execute a command using subprocess in Python
How to transpose a 2D array using only python [Note]
How to update FC2 blog etc. using XMLRPC with python
How to install python using anaconda
How to write a Python class
How to build a Python environment using Virtualenv on Ubuntu 18.04 LTS
[Python] How to make a class iterable
Add a Python data source with Redash
How to draw a graph using Matplotlib
[Python] How to convert a 2D list to a 1D list
How to update Google Sheets from Python
[Python] How to invert a character string
How to install a package using a repository
How to get a stacktrace in python
How to use "deque" for Python data
How to run a Maya Python script
How to generate a new loggroup in CloudWatch using python within Lambda
How to divide and process a data frame using the groupby function
How to send a visualization image of data created in Python to Typetalk
How to get a value from a parameter store in lambda (using python)
How to plot galaxy visible light data using OpenNGC database in python
How to code a drone using image recognition
How to open a web browser from python
How to clear tuples in a list (Python)
How to create a JSON file in Python
[Python] How to read data from CIFAR-10 and CIFAR-100
How to generate a Python object from JSON
[Introduction to Python] How to handle JSON format data
How to add a Python module search path
How to get article data using Qiita API
How to notify a Discord channel in Python
How to search HTML data using Beautiful Soup
[Python] How to draw a histogram in Matplotlib
How to upload to a shared drive using pydrive
How to uninstall a module installed using setup.py
[2015/11/19] How to register a service locally using the python SDK on naoqi os
[Introduction to Python] How to get the index of data with a for statement
Read the Python-Markdown source: How to create a parser
Create a data collection bot in Python using Selenium
How to convert / restore a string with [] in python
How to use pip, a package management system that is indispensable for using Python
How to write a GUI using the maya command
How to install Python
[For beginners] How to study Python3 data analysis exam
How to scrape image data from flickr with python
Data analysis using Python 0
How to update easy_install
(Python) Try to develop a web application using Django
How to scrape horse racing data using pandas read_html
How to auto-submit Microsoft Forms using python (Mac version)
I want to use a python data source in Re: Dash to get query results
[Python] How to expand variables in a character string
How to write a list / dictionary type of Python3
Data cleaning using Python
How to hold a hands-on seminar using Jupyter using docker
How to update Spyder
How to build a Django (python) environment on docker