I made AI think about the lyrics of Kenshi Yonezu (implementation)

Introduction

Kenshi Yonezu sells every time he composes. The lyrics that are spun out seem to have the power to fascinate people. This time, I decided to let deep learning learn its charm.


This article is "** Implementation **". See the previous article for the "pre-processing" code. The general flow of the implementation is as follows.

  1. Hyperparameter / model / loss function / optimization method setting
  2. Learning code
  3. Test code

Model used

Framework: Pytorch Model: seq2seq with Attention Morphological analysis module: jonome Environment: Google Colaboratory


For the mechanism of seq2seq and Attention, see Previous article.

The schematic diagram of this model is as follows. Reference paper

スクリーンショット 2020-05-10 19.40.37.png

Here, SOS is "_".


Implementation

After uploading the required self-made module to Google colab Copy and execute main.py described later.

** Required self-made module **

スクリーンショット 2020-05-09 18.28.25.png

Please refer to github for the code of these self-made modules.


Problem setting (reposted last time)

As shown below, Kenshi Yonezu predicts the "next passage" from the "one passage" of the songs that have been released so far.

|Input text|Output text| |-------+-------| |I'm really happy to see you| _All of them are sad as a matter of course| |All of them are sad as a matter of course| _I have painfully happy memories now| |I have painfully happy memories now| _Raise and walk the farewell that will come someday| |Raise and walk the farewell that will come someday| _It ’s enough to take someone's place and live|

This was created by scraping from Lyrics Net.


Hyperparameter / model / loss function / optimization method setting

Here is a supplement about the previous content. Last time, my goal was to create "input text" and "output text", which were in Japanese, but in fact they are IDized (quantified) with yonedu_dataset.prepare () so that DL can be read. I will.


The following is specified for ** hyperparameters ** from the top.

--Number of nodes in the encoder embedded layer --Number of nodes in the middle layer in the encoder's LSTM layer --Batch size --Number of vocabularies of lyrics written by Mr. Yonezu so far (jonome is used for morphological analysis) --Word ID representing "blank"


The ** model ** is seq2seq, so it has two roles: encoder and decoder.

encoder: embedding layer + hidden layer with LSTM decoder with attention: embedding layer + hidden layer with LSTM + attention system + softmax layer


The ** loss function ** uses the cross-entropy error function, and the ** optimization method ** uses Adam for both the encoder and decoder.

Also, if there is a model parameter, it will be loaded.


from datasets import LyricDataset
import torch
import torch.optim as optim
from modules import *
from device import device
from utils import *
from dataloaders import SeqDataLoader
import math
import os
from utils

 ==========================================
# Data preparation
 ==========================================
# Kenshi Yonezu_lyrics.txt path
 file_path = "lyric / Kenshi Yonezu_lyrics.txt"
 edited_file_path = "lyric / Kenshi Yonezu_lyrics_edit.txt"

yonedu_dataset = LyricDataset(file_path, edited_file_path)
yonedu_dataset.prepare()
 check
print(yonedu_dataset[0])

# Divide into train and test at 8: 2
train_rate = 0.8
data_num = len(yonedu_dataset)
train_set = yonedu_dataset[:math.floor(data_num * train_rate)]
test_set = yonedu_dataset[math.floor(data_num * train_rate):]

# So far last time

 ================================================
# Hyperparameter setting / model / loss function / optimization method
 ================================================
# Hyperparameters
embedding_dim = 200
hidden_dim = 128
BATCH_NUM = 100
EPOCH_NUM = 100
 vocab_size = len (yonedu_dataset.word2id) # vocabulary number
 padding_idx = yonedu_dataset.word2id [" "] # Blank ID

# model
encoder = Encoder(vocab_size, embedding_dim, hidden_dim, padding_idx).to(device)
attn_decoder = AttentionDecoder(vocab_size, embedding_dim, hidden_dim, BATCH_NUM, padding_idx).to(device)

# Loss function
criterion = nn.CrossEntropyLoss()

# Optimization method
encoder_optimizer = optim.Adam(encoder.parameters(), lr=0.001)
attn_decoder_optimizer = optim.Adam(attn_decoder.parameters(), lr=0.001)

# Load parameters if you have a trained model
encoder_weights_path = "yonedsu_lyric_encoder.pth"
decoder_weights_path = "yonedsu_lyric_decoder.pth"
if os.path.exists(encoder_weights_path):
    encoder.load_state_dict(torch.load(encoder_weights_path))
if os.path.exists(decoder_weights_path):
    attn_decoder.load_state_dict(torch.load(decoder_weights_path))

Learning code

Next is the learning code. I think that seq2seq with Attention will look like this, but I will add only one point. Using ** my own data loader **, I get a mini-batch for 100 batch sizes for each epoch, backpropagate the total loss in that data, get the gradient, and update the parameters.

For ** self-made data loader **, refer to Mr. Yasuki Saito's Source code of deep learning 3 made from scratch. I am doing it.


 ================================================
# Learning
 ================================================
all_losses = []
print("training ...")
for epoch in range(1, EPOCH_NUM+1):
    epoch_loss = 0
 # Divide the data into mini-batch
    dataloader = SeqDataLoader(train_set, batch_size=BATCH_NUM, shuffle=False)

    for train_x, train_y in dataloader:

 # Gradient initialization
        encoder_optimizer.zero_grad()
        attn_decoder_optimizer.zero_grad()

 #Encoder forward propagation
        hs, h = encoder(train_x)

 # Attention Decoder Input
        source = train_y[:, :-1]

 Correct answer data of #Attention Decoder
        target = train_y[:, 1:]

        loss = 0
        decoder_output, _, attention_weight = attn_decoder(source, hs, h)
        for j in range(decoder_output.size()[1]):
            loss += criterion(decoder_output[:, j, :], target[:, j])

        epoch_loss += loss.item()

 #Error back propagation
        loss.backward()

 #Parameter update
        encoder_optimizer.step()
        attn_decoder_optimizer.step()

 # Show loss
    print("Epoch %d: %.2f" % (epoch, epoch_loss))
    all_losses.append(epoch_loss)
    if epoch_loss < 0.1: break
print("Done")

import matplotlib.pyplot as plt
plt.plot(all_losses)
plt.savefig("attn_loss.png ")

# Save model
torch.save(encoder.state_dict(), encoder_weights_path)
torch.save(attn_decoder.state_dict(), decoder_weights_path)

Test code

Here is the test code. What I'm doing is creating the table shown in ** [Results] **. There are two points to note.

--Don't get the gradient because it is for prediction at the test stage --First, input "_" to indicate the start of character string generation in Decoder (same conditions as in learning)


 =======================================
# test
 =======================================
# Word-> ID conversion dictionary
word2id = yonedu_dataset.word2id
# ID-> word conversion dictionary
id2word = get_id2word(word2id)

# Number of elements in one correct answer data
output_len = len(yonedu_dataset[0][1])

# Evaluation data
test_dataloader = SeqDataLoader(test_set, batch_size=BATCH_NUM, shuffle=False)

# Data frame to display the result
df = pd.DataFrame(None, columns=["input", "answer", "predict", "judge"])
# Turn the data loader to populate the data frame that displays the results
for test_x, test_y in test_dataloader:
    with torch.no_grad():
        hs, encoder_state = encoder(test_x)

 Since "_" indicating the start of character string generation is input to # Decoder first,
 # Create "_" tensors for batch size
        start_char_batch = [[word2id["_"]] for _ in range(BATCH_NUM)]
        decoder_input_tensor = torch.tensor(start_char_batch, device=device)

        decoder_hidden = encoder_state
        batch_tmp = torch.zeros(100,1, dtype=torch.long, device=device)
        for _ in range(output_len - 1):
            decoder_output, decoder_hidden, _ = attn_decoder(decoder_input_tensor, hs, decoder_hidden)
 # While getting the predicted character, it becomes the input of the next decoder as it is
            decoder_input_tensor = get_max_index(decoder_output.squeeze(), BATCH_NUM)
            batch_tmp = torch.cat([batch_tmp, decoder_input_tensor], dim=1)
 predicts = batch_tmp [:, 1:] # Receive predicted batches
        if test_dataloader.reverse:
 test_x = [list (line) [:: -1] for line in test_x] #Return the inverted one
        df = predict2df(test_x, test_y, predicts, df)
df.to_csv("predict_yonedsu_lyric.csv", index=False)

result

All questions are incorrect. However, the goal this time was "** Capturing the characteristics of Kenshi Yonezu's lyrics **". An excerpt from the table.

** input **: Input text ** output **: Correct output text ** predict **: DL predicted text ** judge **: Does output and predict match?

input | output | predict | judge ---------+----------------+----------------+------------ I didn't care if it was a mistake or a correct answer|In the light mist that fell in a blink of an eye|I'm sad because I want to be loved, so maybe you're the only one|X I felt that everything had changed since that day|A deep spring corner that is blown away by the wind|The warm place is still beautiful|X Let's find out one by one|Like a kid getting up|Withered blue, even that color|X No matter what you are doing today|I will look for you|I was looking for a city that wouldn't change|X


What I found

--The predictive sentence is not unclear (the grammar is accurate like "still") --The context from input is not too far off ――However, it is honestly delicate whether you can capture the characteristics of Mr. Yonezu's word selection.

Since overfitting was not seen this time, it is considered that the cause of the lack of learning is mainly the small number of data. No, we are the only ones who have decided that "lack of learning", and maybe there is something that AI thinks about ...

Recommended Posts

I made AI think about the lyrics of Kenshi Yonezu (implementation)
I made AI think about the lyrics of Kenshi Yonezu (pre-processing)
The Python project template I think of.
I read the implementation of golang channel
I read the implementation of range (Objects / rangeobject.c)
I tried to vectorize the lyrics of Hinatazaka46!
I followed the implementation of the du command (first half)
Think about the next generation of Rack and WSGI
About testing in the implementation of machine learning models
I followed the implementation of the du command (second half)
I took a look at the contents of sklearn (scikit-learn) (1) ~ What about the implementation of CountVectorizer? ~
A reminder about the implementation of recommendations in Python
Think about the analysis environment (Part 1: Overview) * As of January 2017
Tank game made with python About the behavior of tanks
I made a function to check the model of DCGAN
I made a dot picture of the image of Irasutoya. (part1)
I made a dot picture of the image of Irasutoya. (part2)
About the ease of Python
About the components of Luigi
About the features of Python
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
[Python] I thoroughly explained the theory and implementation of logistic regression
I made a slack bot that notifies me of the temperature
I tried to summarize the frequently used implementation method of pytest-mock
[Kaggle] I made a collection of questions using the Titanic tutorial
Think about the minimum change problem
I investigated the mechanism of flask-login!
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
About the upper limit of threads-max
About the behavior of yield_per of SqlAlchemy
About the size of matplotlib points
About the basics list of Python basics
Roughly think about the loss function
I made a calendar that automatically updates the distribution schedule of Vtuber
I wanted to be careful about the behavior of Python's default arguments
I want to express my feelings with the lyrics of Mr. Children
I tried to summarize the logical way of thinking about object orientation.
I made a GAN with Keras, so I made a video of the learning process.
I made a program to check the size of a file in Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I think the limit of knapsack is not the weight but the volume w_11/22update
I made a function to see the movement of a two-dimensional array (Python)