goal

Modify a part of "Transformer model for language understanding" in Tensorflow tutorial, To be able to do text classification tasks.

Notebook I have uploaded my Notebook on Github. transformer_classify

Commentary

The main differences from the tutorial are listed below.

Data used is livedoor news corpus

――The classification task introduced in this article is assumed to be Japanese document classification when used for business. ――For that reason, we used the livedoor news corpus, which is often used in machine learning.

Use Human for word-separation

――We use Human, which has a good reputation for Japanese word-separation. --Click here for the Dockerfile that automates the download and installation of Juman (https://github.com/raidenn-gh/dockerfile_tf2_py3_jpt_juman)

Decoder removal

--Decoder is a mechanism that receives the output of Encoder and converts it into another language vector. ――This time, we will not use Decoder because it is a classification task, not a conversion to another language vector.

Transformer fix

--Instead of removing the Decoder, overlay the Dense layer on the output obtained by the Encoder and add it as an output layer. --To convert the input text vector into a value that probabilistically expresses which class it is classified into. The Softmax function is used as the activation function.

`transformer_classify.ipynb`


NUMLABELS = 9

class Transformer(tf.keras.Model):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, 
               target_vocab_size, pe_input, pe_target, rate=0.1):
    super(Transformer, self).__init__()

    self.encoder = Encoder(num_layers, d_model, num_heads, dff, 
                           input_vocab_size, pe_input, rate)
    self.dense1 = tf.keras.layers.Dense(d_model, activation='tanh')
    self.dropout1 = tf.keras.layers.Dropout(rate)   
    self.final_layer = tf.keras.layers.Dense(NUMLABELS, activation='softmax')
        
  def call(self, inp, tar, training, enc_padding_mask):

    enc_output = self.encoder(inp, training, enc_padding_mask)  # (batch_size, inp_seq_len, d_model)
    enc_output = self.dense1(enc_output[:,0])
    enc_output = self.dropout1(enc_output, training=training)
    final_output = self.final_layer(enc_output )  # (batch_size, tar_seq_len, target_vocab_size)
    
    return final_output

Loss function

--Since the activation function of the output layer uses the Softmax function, the loss function uses multiclass cross entropy. --Since it is not vectorized in One-hot, SparseCategoricalCrossentropy () is used.

`transformer_classify.ipynb`


loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

def loss_function(labels, pred):
  loss_ = loss_object(labels, pred)
  return loss_

Addition of val_step

--Val_step using valid data is added after train_step. --Because it is validation, traininng is set to false to skip the dropout layer.

result

I couldn't get very good accuracy.

Reference URL

tf2_classify BERT with SentencePiece for Japanese text. Make and understand Transformer / Attention Transformer model for language understanding

[DOCKER] [Machine learning] Text classification using Transformer model (Attention-based classifier)

goal

Commentary

Data used is livedoor news corpus

Use Human for word-separation

Decoder removal

Transformer fix

transformer_classify.ipynb

Loss function

transformer_classify.ipynb

Addition of val_step

result

Reference URL

`transformer_classify.ipynb`

`transformer_classify.ipynb`