Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7

Introduction

It is a memo of what I stumbled upon in Chapter 7 of "Deep Learning from scratch ❷ --- Natural language processing", which I suddenly started studying.

The execution environment is macOS Catalina + Anaconda 2019.10, and the Python version is 3.7.4. For details, refer to Chapter 1 of this memo.

(To other chapters of this memo: Chapter 1 / Chapter 2 / Chapter 3 / Chapter 4 / Chapter 5 / Chapter 6 / Chapter 7)

Chapter 7 Sentence generation by RNN

This chapter describes sentence generation using the language model created in the previous chapter and a new model called seq2seq. I didn't have time to implement it myself, so I'm still at the level of trying to implement the book. Please note.

7.1 Sentence generation using the language model

I'm not good at English with the language model made from the PTB corpus, so I'm not sure if the result is good or bad, so I tried it with the language model of Aozora Bunko made in the previous chapter.

First, in order to try RNNLM whose perplexity was 105.19 in the previous chapter, we will change ch07/generate_text.py a little. The part is the change.

ch07/generate_text.py


# coding: utf-8
import sys
sys.path.append('..')
from rnnlm_gen import RnnlmGen
from dataset import aozorabunko  #★ Changed to use the corpus of Aozora Bunko


corpus, word_to_id, id_to_word = aozorabunko.load_data('train')  #★ Change corpus
vocab_size = len(word_to_id)
corpus_size = len(corpus)

model = RnnlmGen(vocab_size=vocab_size)  #★ Specify the number of vocabularies (default value of Rnnlm = PTB value)
model.load_params('../ch06/Rnnlm.pkl')

#Set start and skip characters
start_word = 'you'  # ★you
start_id = word_to_id[start_word]
skip_words = []  #★ No skip characters as it has not been preprocessed
skip_ids = [word_to_id[w] for w in skip_words]
#Sentence generation
word_ids = model.generate(start_id, skip_ids)
#★ Because it is Japanese, it is connected without spaces,<eos>Replace with punctuation + line break
eos_id = word_to_id['<eos>']
txt = ''.join([id_to_word[i] if i != eos_id else '。\n' for i in word_ids])
txt = txt.replace('\n。\n', '\n')  #Removal of blank lines
txt = txt.replace('」。\n', '」\n')  #Remove the punctuation mark at the end of the conversation
print(txt)

And it is the result of the sentence generated from "you". I tried it several times.

Perplexity 105.19th edition


After a while, you guys are going on, and after a while, you're the gentleman in that woman's train, so when you hear a little more about Edo, you still can't see it as BC. ..
He wouldn't cross quietly as it was, so I had to mess it up, and I just gave him a literary story about Pamijin.
The young ladies try to be disappointed

Perplexity 105.19th edition


I interrupted you and sent it to the pigeon's mouth.
"That's the way back, this house is pomponing." ""
"Isn't it?"
"That's right"
"Then I can't help.
」
"How is it?" Giovanni inadvertently flew through the hairy straw on the head of the instrument.
Also, seaweed is like gold.

Perplexity 105.19th edition


I caught up with your shield.
The two did not return.
However, I was always sleeping because of the difficulty.
However, outside the mirror, the mad neck may be meaningless, so the sound of silence depends on what it was, and in what way it was the very original of white intractable liquor. This is a long and narrow one, which is too big for the main gate, and Iisaki Tako is irresistible.
Therefore

Somehow it's becoming a sentence. Since the corpus I'm using is a novel, I have some novel-like sentences.

In the second result, you can see that the parentheses are roughly aligned and that the relationship between the start and end of the parentheses can be properly remembered.

Even so, the development and meaning of the second result, "I sent it to the pigeon's mouth," to "Giovanni flew involuntarily through the hairy straw on the head of the instrument." I'm worried: grin:

Next, I will try the improved version, which was reduced to Perplexity 73.66 in the previous chapter. The following is a modified version of ch07/generate_better_text.py. The part is the change.

ch07/generate_better_text.py


# coding: utf-8
import sys
sys.path.append('..')
from common.np import *
from rnnlm_gen import BetterRnnlmGen
from dataset import aozorabunko  #★ Changed to use the corpus of Aozora Bunko


corpus, word_to_id, id_to_word = aozorabunko.load_data('train')  #★ Change corpus
vocab_size = len(word_to_id)
corpus_size = len(corpus)

model = BetterRnnlmGen(vocab_size=vocab_size)  #★ Specify the number of vocabulary (default is BetterRnnlm)
model.load_params('../ch06/BetterRnnlm.pkl')

#Set start and skip characters
start_word = 'you'  # ★you
start_id = word_to_id[start_word]
skip_words = []  #★ No skip characters as it has not been preprocessed
skip_ids = [word_to_id[w] for w in skip_words]
#Sentence generation
word_ids = model.generate(start_id, skip_ids)
#★ Because it is Japanese, it is connected without spaces,<eos>Replace with punctuation + line break
eos_id = word_to_id['<eos>']
txt = ''.join([id_to_word[i] if i != eos_id else '。\n' for i in word_ids])
txt = txt.replace('\n。\n', '\n')  #Removal of blank lines
txt = txt.replace('」。\n', '」\n')  #Remove the punctuation mark at the end of the conversation
print(txt)


model.reset_state()

start_words = 'The meaning of life is'  # ★the meaning of life is
start_ids = [word_to_id[w] for w in start_words.split(' ')]

for x in start_ids[:-1]:
    x = np.array(x).reshape(1, 1)
    model.predict(x)

word_ids = model.generate(start_ids[-1], skip_ids)
word_ids = start_ids[:-1] + word_ids
#★ Because it is Japanese, it is connected without spaces,<eos>Replace with punctuation + line break
txt = ''.join([id_to_word[i] if i != eos_id else '。\n' for i in word_ids])
txt = txt.replace('\n。\n', '\n')  #Removal of blank lines
txt = txt.replace('」。\n', '」\n')  #Remove the punctuation mark at the end of the conversation
print('-' * 50)
print(txt)

The following is the result of generating sentences starting with "you".

Perplexity 73.66th edition


I took it off without even knowing you.
However, I wonder if that person will be interested in me from now on, and when I get to work, I can't help myself.
slip'It will be the death of a family called the royal family.
It would be fun to read the history of household races and see the four or two letters if people are left in the water like the world and shoulders with tools to throw away money and pleasure.
Belly representative

Perplexity 73.66th edition


You told me to talk to you.
(I'm such a go.
At one point, I came with Ueno with a gate entrance lantern.
I will not go on my own because it will come out in the middle of the night of this year.
It's decreasing, my hands are crying, and now it's not so easy.
I'm sick now, so the teacher has broken the way of holding it rather than the best means. "
"why"
"No still

Perplexity 73.66th edition


I wonder if you came so dizzy because you have the fact that you went.
I don't know if I'm still sick.
No matter how much you can swim, it's definitely coming.
This twenty-four.
I'm spinning around on the floor, looking for customers with anxious faces.
Both teachers had sex near noon and started walking while returning to the tatami room.
When we came to Tokyo in the back, it was a little lively next time

Somehow, I feel that it is more Japanese than the previous result.

In addition, the word "slip'" suddenly appeared in the first result, so when I looked up the corpus, it said "many a slip' twixt the cup and the lip" in "I am a cat". There was a sentence saying that you probably know the Western proverb. It's only used here, so you have to preprocess rare words like the PTB corpus: sweat:

Then, like the book, I tried to tell the continuation of "The meaning of life".

Perplexity 73.66th edition


The garden was restless to the end because the meaning of life seemed to be the same as the history of the mind.
Something is related to the famous literary artist.
Is it okay for a burnt sand lens to be hit by pressure?
This isn't a big stimulus to the spiritual world, as they saw the problems every month and tied their lies to the cup.
It ’s just a belly

Perplexity 73.66th edition


The meaning of life is messy because the writer's advertisement is wrong.
So I felt like I wouldn't die in the future.
Sanshiro is like this.
Even in Tokyo, I thought that the person who put in a circle early and dug up the corporal was slaughtered, and it was just like this.
Sanshiro was still sitting under the bag with his master from the lady's god and giving lessons.

Perplexity 73.66th edition


The meaning of life was just that I chose the second distant place.
However, when such a face is over, that word.
I started today when I went to Noonmachi.
A man who lives as a friend does not go to the inn The singing basket is the light of stripes, so he was alive when he urged him now, so it was the face of his mother who gave him the surrounding thirteen meals.
The teacher is Matachi and Kura

I didn't come up with very deep words, but since it is a probabilistic logic, it may be said that it is repeated.

Finally, let me write the continuation of "I am a dog."

Perplexity 73.66th edition


I am a dog.
While climbing the cup, shake your hands while holding your hands on your front legs.
A duster room is in the works.
I was in a bad mood.
Then it ends with a different eye.
Looking beyond the water, they are lined up.
I have a relative in the daytime while crushing from the tatami mat on the front.
But even a man who may look at him asked this question and brought him 100,000 today.

I don't know what it is, but he made something that looks like a novel.

7.2 seq2seq

This is a description of seq2seq that converts time series data to time series data. The book deals with addition as a toy problem, but it's not fun to try the same, so I decided to solve the square root. For example, "2" is given to the input and "1.414" is output.

The dataset we created is simple, a pair of 50,000 numbers from 0 to 49,999 and their square roots (4 valid digits). I have aligned the digits and separated the input and output with _ so that you can learn as it is with the code of the book. Below is the dataset generation code dataset/create_sqroot_dataset.py. If you move this in the dataset directory, you will get sqroot.txt.

dataset/create_sqroot_dataset.py


# coding: utf-8
import math


file_name = 'sqroot.txt'
with open(file_name, mode='w') as f:
    for i in range(50000):
        res = f'{math.sqrt(i):.4g}'
        f.write(f'{i: <5}_{res: <5}\n')

The contents of the generated dataset sqroot.txt looks like this:

dataset/sqroot.txt


0    _0    
1    _1    
2    _1.414
3    _1.732
4    _2    
5    _2.236
6    _2.449
7    _2.646
8    _2.828
9    _3    
10   _3.162
11   _3.317
12   _3.464
13   _3.606
14   _3.742
15   _3.873
16   _4    
17   _4.123
18   _4.243
19   _4.359

(Omitted)

49980_223.6
49981_223.6
49982_223.6
49983_223.6
49984_223.6
49985_223.6
49986_223.6
49987_223.6
49988_223.6
49989_223.6
49990_223.6
49991_223.6
49992_223.6
49993_223.6
49994_223.6
49995_223.6
49996_223.6
49997_223.6
49998_223.6
49999_223.6

The input is 5 characters, the output is 6 characters including _, and the number of vocabulary is 13 which is the same as the addition data set (addition+is reduced and decimal point. is increased).

7.3 Implementation of seq2seq

In the state before the improvement, the correct answer rate did not increase as easily as the addition.

7.4 Improvement of seq2seq

With the addition of data inversion and peeping improvements, it managed to find the square root.

Below is the source of ch07/train_seq2seq.py. The part is the change from the code of the book. I tried hyperparameters several times and increased the size of the hidden layer a bit.

ch07/train_seq2seq.py


# coding: utf-8
import sys
sys.path.append('..')
import numpy as np
import matplotlib.pyplot as plt
from dataset import sequence
from common.optimizer import Adam
from common.trainer import Trainer
from common.util import eval_seq2seq
from seq2seq import Seq2seq
from peeky_seq2seq import PeekySeq2seq


#Data set loading
(x_train, t_train), (x_test, t_test) = sequence.load_data('sqroot.txt')   #★ Data set change
char_to_id, id_to_char = sequence.get_vocab()

# Reverse input? =================================================
is_reverse = True  #★ Improved version
if is_reverse:
    x_train, x_test = x_train[:, ::-1], x_test[:, ::-1]
# ================================================================

#Hyperparameter settings
vocab_size = len(char_to_id)
wordvec_size = 16
hidden_size = 192  #★ Adjustment
batch_size = 128
max_epoch = 25
max_grad = 5.0

# Normal or Peeky? ==============================================
# model = Seq2seq(vocab_size, wordvec_size, hidden_size)
model = PeekySeq2seq(vocab_size, wordvec_size, hidden_size)  #★ Improved version
# ================================================================
optimizer = Adam()
trainer = Trainer(model, optimizer)

acc_list = []
for epoch in range(max_epoch):
    trainer.fit(x_train, t_train, max_epoch=1,
                batch_size=batch_size, max_grad=max_grad)

    correct_num = 0
    for i in range(len(x_test)):
        question, correct = x_test[[i]], t_test[[i]]
        verbose = i < 10
        correct_num += eval_seq2seq(model, question, correct,
                                    id_to_char, verbose, is_reverse)

    acc = float(correct_num) / len(x_test)
    acc_list.append(acc)
    print('val acc %.3f%%' % (acc * 100))

#Drawing a graph
x = np.arange(len(acc_list))
plt.plot(x, acc_list, marker='o')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.ylim(0, 1.0)
plt.show()

Below is the last part of the execution result.

| epoch 25 |  iter 1 / 351 | time 0[s] | loss 0.08
| epoch 25 |  iter 21 / 351 | time 6[s] | loss 0.08
| epoch 25 |  iter 41 / 351 | time 13[s] | loss 0.09
| epoch 25 |  iter 61 / 351 | time 18[s] | loss 0.08
| epoch 25 |  iter 81 / 351 | time 22[s] | loss 0.09
| epoch 25 |  iter 101 / 351 | time 27[s] | loss 0.09
| epoch 25 |  iter 121 / 351 | time 32[s] | loss 0.08
| epoch 25 |  iter 141 / 351 | time 38[s] | loss 0.08
| epoch 25 |  iter 161 / 351 | time 43[s] | loss 0.09
| epoch 25 |  iter 181 / 351 | time 48[s] | loss 0.08
| epoch 25 |  iter 201 / 351 | time 52[s] | loss 0.08
| epoch 25 |  iter 221 / 351 | time 56[s] | loss 0.09
| epoch 25 |  iter 241 / 351 | time 61[s] | loss 0.08
| epoch 25 |  iter 261 / 351 | time 66[s] | loss 0.09
| epoch 25 |  iter 281 / 351 | time 72[s] | loss 0.09
| epoch 25 |  iter 301 / 351 | time 77[s] | loss 0.08
| epoch 25 |  iter 321 / 351 | time 81[s] | loss 0.09
| epoch 25 |  iter 341 / 351 | time 85[s] | loss 0.09
Q 27156
T 164.8
☑ 164.8
---
Q 41538
T 203.8
☑ 203.8
---
Q 82   
T 9.055
☒ 9.124
---
Q 40944
T 202.3
☑ 202.3
---
Q 36174
T 190.2
☑ 190.2
---
Q 13831
T 117.6
☑ 117.6
---
Q 16916
T 130.1
☑ 130.1
---
Q 1133 
T 33.66
☒ 33.63
---
Q 31131
T 176.4
☑ 176.4
---
Q 21956
T 148.2
☑ 148.2
---
val acc 79.000%

result.png I managed to get a correct answer rate of just under 80%. It may be possible to improve by adjusting the hyperparameters a little more, but I found that simply using a model that works well with addition does not easily give high accuracy. It seems difficult to select and adjust the model according to the problem to be dealt with.

7.5 seq2 Applications using seq

Seeing examples like chatbots and image captions will open up your dreams. Behind the scenes, I'm deeply moved when I think that there was a lot of trial and error by the ancestors.

7.6 Summary

That's all for this chapter. If you have any mistakes, I would be grateful if you could point them out.

(To other chapters of this memo: Chapter 1 / Chapter 2 / Chapter 3 / Chapter 4 / Chapter 5 / Chapter 6 / Chapter 7)

Recommended Posts

Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 6
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 6]
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Deep Learning from scratch
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
Deep Learning from scratch 1-3 chapters
Deep Learning / Deep Learning from Zero Chapter 3 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo
Create an environment for "Deep Learning from scratch" with Docker
Deep learning from scratch (cost calculation)
Deep Learning / Deep Learning from Zero 2 Chapter 7 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 8 Memo
Deep Learning / Deep Learning from Zero Chapter 5 Memo
Deep Learning / Deep Learning from Zero Chapter 4 Memo
Deep Learning / Deep Learning from Zero 2 Chapter 3 Memo
Deep Learning memos made from scratch
Deep Learning / Deep Learning from Zero 2 Chapter 6 Memo
Write an impression of Deep Learning 3 framework edition made from scratch
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
"Deep Learning from scratch" in Haskell (unfinished)
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
Python vs Ruby "Deep Learning from scratch" Chapter 2 Logic circuit by Perceptron
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
An amateur tried Deep Learning using Caffe (Introduction)
An amateur tried Deep Learning using Caffe (Practice)
An amateur tried Deep Learning using Caffe (Overview)
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Lua version Deep Learning from scratch Part 5.5 [Making pkl files available in Lua Torch]
[Deep Learning from scratch] I implemented the Affine layer
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
Application of Deep Learning 2 made from scratch Spam filter
[Deep Learning from scratch] I tried to explain Dropout
Python vs Ruby "Deep Learning from scratch" Chapter 3 Graph of step function, sigmoid function, ReLU function
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch
Deep learning / LSTM scratch code
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.