Steps to learn & infer transformer English-Japanese translation model with CloudTPU

Official tutorial Note that it didn't work if I proceeded according to the street. In this article, we will explain the procedure for launching CloudTPU and GCE VM instance and building an English-Japanese translation model with transformer, which is one of the NMT models.


--You have already created a project on Google Cloud Platform --Billing is enabled for the created project

Cloud Console Enter the following in the Cloud Console to launch a new Cloud TPU and GCE VM instance.


#Set of project IDs
gcloud config set project <project_id>
#Start ctpu(ctpu name is transformer)
#Also launch a GCE instance
ctpu up --name=transformer --tf-version=1.14

In the official tutorial, it is supposed to start with ctpu up, but since the version does not match the default tensorflow of the GCE VM instance, an error will occur if you proceed according to the tutorial. To follow the tutorial, you need to match the version of CloudTPU tensorflow with that of your GCE VM instance.

GCE(Google Computing Engine) We will explain the procedure for learning and inferring with the transformer model based on your own data set (English-Japanese translation) stored in GCS (Google Cloud Strage). Below, we will proceed with SSH connection to the GCE VM instance created with ctpu up.

Directory structure in the VM instance

├── src
│   ├──
│   └──
└── tmp
    └── t2t_tmp
        └── sample.picke

Download training dataset from GCS to GCE

gsutil cp gs://<budge_name>/sample.pickle ./tmp/t2t_tmp/sample.pickle

Here, sample.pickle is a two-column data frame consisting of english (English) and japanese (Japanese).

Definition of PROBLEM

If you want to use your own dataset, you need to implement and register PROBME. Reference: Here, create the following two Python scripts.


from . import myproblem


import pickle

import numpy as np

from tensor2tensor.data_generators import problem
from tensor2tensor.data_generators import text_problems
from tensor2tensor.utils import registry

class Translate_JPEN(text_problems.Text2TextProblem):
    def approx_vocab_size(self):
        return 2**13

    def is_generate_per_split(self):
        return False

    def dataset_splits(self):
        return [{
            "split": problem.DatasetSplit.TRAIN,
            "shards": 9,
        }, {
            "split": problem.DatasetSplit.EVAL,
            "shards": 1,

    def generate_samples(self, data_dir, tmp_dir, dataset_split):
        with open('./tmp/t2t_tmp/sample.pickle', 'rb') as fin:
            sentences = pickle.load(fin)
        for row in np.array(sentences):
            yield {'inputs': row[0], 'targets': row[1]}

Set environment variables in the VM instance

#Set of environment variables
export STORAGE_BUCKET=gs://<project_name>
export DATA_DIR=$STORAGE_BUCKET/transformer
export TMP_DIR=/tmp/t2t_tmp
export PATH=.local/bin:$PATH
export PROBLEM=translate_jpen
export TRAIN_DIR=$STORAGE_BUCKET/training/transformer_ende
export MODEL=transformer
export HPARAMS=transformer_tpu
#Self-made script
export USR_DIR=./src

Preprocessing and learning

After preprocessing based on your own ./src/, you will learn. Here, cloud_tpu_name directly specifies the name specified in ctpu up. (If you specify it with $ TPU_NAME, an error will occur.) Reference:

It took about 3 hours for a dataset of about 60,000 translations, depending on the amount of data.

t2t-datagen \
  --problem=$PROBLEM \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --train_steps=40000 \
  --eval_steps=3 \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --t2t_usr_dir=$USR_DIR \
  --use_tpu=True \


After learning, perform inference. You can perform translations in an interactive shell by setting the decode_interactive parameter to True. If you want to infer locally based on the learning result of _CloudTPU, please refer to the following. _

t2t-decoder \
   --data_dir=$DATA_DIR \
   --problem=$PROBLEM \
   --model=$MODEL \
   --hparams_set=$HPARAMS \
   --output_dir=$TRAIN_DIR \
   --t2t_usr_dir=$USR_DIR \
   --decode_hparams="beam_size=4,alpha=0.6 \


--English-Japanese translation with Transformer --Japanese-English translation with tensor2tensor --Try seq2seq with your own data using Tensor2Tensor

Recommended Posts

Steps to learn & infer transformer English-Japanese translation model with CloudTPU
MVC --Model edition to learn from 0 with prejudice only
How to run a trained transformer model locally on CloudTPU
Steps to develop Django with VSCode
Learn to colorize monochrome images with Chainer
"How to pass PATH" to learn with homebrew
Infer Custom Vision model with Raspberry Pi
Preparing to learn technical indicators with TFlearn