For some reason, when I was looking at the Tableau SDK API Reference, I found the following description. I was surprised.

Insert Data Directly from CSV Files

You can use the PostgreSQL-like COPY command to insert records into a hyper file, which is Tableau's unique data format, at high speed **.

A few years ago, I implemented a process to create a tde file from a CSV file in order to distribute Tableau Reader regularly, but due to the SDK specifications, it is an insert for each record, so processing a large amount of data I have the impression that it took a long time. However, if you can insert it directly with the COPY command, it seems that you can quickly create Tableau's original file (hyper in this case) even with a large amount of data.

This time, I will insert the training data (CSV file) of Kaggle's Titanic dataset into the hyper file using the COPY command.

Click here for the code https://github.com/yolo-kiyoshi/csv2hyper

Premise

Mac OS Catalina 10.15.3
Python 3.7
tableauhyperapi 0.0.9746

The complete _API can be installed with pip install tableau hyperapi. _

Directory structure

It is running in the following directory. The .hyper file is created underdata /.

`Directory structure`


.
├── data
│   └── train.csv # Titanic train
└── notebook
    └── insert_test.ipynb

Implementation code

First, define the I / O file.

#Insert source CSV file path
src_path = '../data/train.csv'
#Hyper file path you want to create
dist_path = '../data/train.hyper'

Then use the COPY command to insert from CSV to hyper. It's a rough explanation, but it's like "(1) create a session to process hyper files", "(2) create a virtual PostgreSQL-like table", "(3) insert all at once from a CSV file", and "(4) spit out the entire table into a hyper file".

from tableauhyperapi import HyperProcess, Telemetry, Connection, CreateMode, NOT_NULLABLE, NULLABLE, SqlType, TableDefinition, escape_string_literal

#Create a session to work with Hyper files
with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
     with Connection(
         endpoint=hyper.endpoint,
         database=dist_path,
         create_mode=CreateMode.CREATE_AND_REPLACE
     ) as connection:
            #Define table
            table_def = TableDefinition(
                table_name='train',
                columns=[
                TableDefinition.Column("PassengerId", SqlType.big_int(), NOT_NULLABLE),
                TableDefinition.Column("Survived", SqlType.big_int(), NOT_NULLABLE),
                TableDefinition.Column("Pclass", SqlType.big_int(), NOT_NULLABLE),
                TableDefinition.Column("Name", SqlType.text(), NOT_NULLABLE),
                TableDefinition.Column("Sex", SqlType.text(), NOT_NULLABLE),
                TableDefinition.Column("Age", SqlType.double(), NULLABLE),
                TableDefinition.Column("SibSp", SqlType.big_int(), NOT_NULLABLE),
                TableDefinition.Column("Parch", SqlType.big_int(), NOT_NULLABLE),
                TableDefinition.Column("Ticket", SqlType.text(), NOT_NULLABLE),
                TableDefinition.Column("Fare", SqlType.double(), NULLABLE),
                TableDefinition.Column("Cabin", SqlType.text(), NULLABLE),
                TableDefinition.Column("Embarked", SqlType.text(), NULLABLE)
                ]
            )
            #Virtually create a table based on the table definition(default schema is public)
            connection.catalog.create_table(table_def)
            #Execute COPY command like PostgreSQL
            record_count = connection.execute_command(
                command=f'''
                COPY {table_def.table_name} from {escape_string_literal(src_path)} with (format csv, delimiter ',', header)
                '''
            )
            print(f"The number of rows in table {table_def.table_name} is {record_count}.")

`Execution result`


The number of rows in table "train" is 891.

After processing, if you look at data / train.hyper in Tableau ... スクリーンショット 2020-02-09 14.43.30.png I was able to confirm that the hyper file was created successfully!

Finally

I was impressed that the Tableau SDK has evolved so far. If you mainly use Tableau Desktop or Tableau Server, you will rarely have the opportunity to use hyper files, but if for some reason you have to distribute reports using Tableau Reader etc., it is quite useful. Will do.

reference

https://help.tableau.com/current/api/hyper_api/en-us/docs/hyper_api_insert_csv.html

How to batch insert from CSV file to Tableau hyper file with PostgreSQL-like COPY command

Premise

Directory structure

Directory structure

Implementation code

Execution result

Finally

reference

`Directory structure`

`Execution result`