Modify a part of "Transformer model for language understanding" in Tensorflow tutorial, To be able to do text classification tasks.
Notebook I have uploaded my Notebook on Github. transformer_classify
The main differences from the tutorial are listed below.
――The classification task introduced in this article is assumed to be Japanese document classification when used for business. ――For that reason, we used the livedoor news corpus, which is often used in machine learning.
――We use Human, which has a good reputation for Japanese word-separation. --Click here for the Dockerfile that automates the download and installation of Juman (https://github.com/raidenn-gh/dockerfile_tf2_py3_jpt_juman)
--Decoder is a mechanism that receives the output of Encoder and converts it into another language vector. ――This time, we will not use Decoder because it is a classification task, not a conversion to another language vector.
--Instead of removing the Decoder, overlay the Dense layer on the output obtained by the Encoder and add it as an output layer. --To convert the input text vector into a value that probabilistically expresses which class it is classified into. The Softmax function is used as the activation function.
transformer_classify.ipynb
NUMLABELS = 9
class Transformer(tf.keras.Model):
def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size,
target_vocab_size, pe_input, pe_target, rate=0.1):
super(Transformer, self).__init__()
self.encoder = Encoder(num_layers, d_model, num_heads, dff,
input_vocab_size, pe_input, rate)
self.dense1 = tf.keras.layers.Dense(d_model, activation='tanh')
self.dropout1 = tf.keras.layers.Dropout(rate)
self.final_layer = tf.keras.layers.Dense(NUMLABELS, activation='softmax')
def call(self, inp, tar, training, enc_padding_mask):
enc_output = self.encoder(inp, training, enc_padding_mask) # (batch_size, inp_seq_len, d_model)
enc_output = self.dense1(enc_output[:,0])
enc_output = self.dropout1(enc_output, training=training)
final_output = self.final_layer(enc_output ) # (batch_size, tar_seq_len, target_vocab_size)
return final_output
--Since the activation function of the output layer uses the Softmax function, the loss function uses multiclass cross entropy. --Since it is not vectorized in One-hot, SparseCategoricalCrossentropy () is used.
transformer_classify.ipynb
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
def loss_function(labels, pred):
loss_ = loss_object(labels, pred)
return loss_
--Val_step using valid data is added after train_step. --Because it is validation, traininng is set to false to skip the dropout layer.
I couldn't get very good accuracy.
tf2_classify BERT with SentencePiece for Japanese text. Make and understand Transformer / Attention Transformer model for language understanding