Congratulations on the 20th anniversary of Evangelion: confetti_ball: In addition, Happy Birthday to Asuka. (4 days late)
I tried to automatically generate sentences, which is also used in Twitter Bot etc., using a recurrent neural network (hereinafter: RNN) which is a kind of popular deep learning.
Even if there is nothing, it will not start without data. I was prepared to transcribe it, but there was a thankful site that summarizes all the lines of animation. Thanks.
I extracted all the lines from here. The format of the dialogue is like this, and the character name is "Serif".
Broadcast "Today, 12:In the 30th minute, a special state of emergency was issued throughout the central Kanto region, centered on the Tokai region. Residents should evacuate to the designated shelter immediately. "
Broadcast "I will tell you repeatedly ..."
Misato "I didn't want to lose sight of it at this time."
Telephone "All regular lines are currently out of service due to the issuance of a special state of emergency."
Shinji "No, I didn't come ... I can't meet ... I can't help it, let's go to the shelter."
Operator "Unidentified moving objects are still in progress for the headquarters"
Shigeru "Check the target on the video. Turn it to the main monitor."
Fuyutsuki "It's been 15 years since then"
Gendou "Oh, no doubt, an apostle."
・ ・ ・
This data is divided into lines for each character, and the original data for automatically generating the lines is created.
I wrote it in python.
# -*- coding: utf-8 -*-
import sys
import os
import chardet
from os.path import join, splitext, split
#Specify input directory and output directory
readdir = sys.argv[1]
outdir = sys.argv[2]
print "readdir:\t", readdir
print "outdir:\t", outdir
#Get a list of files in a directory
txts = os.listdir(readdir)
for txt in txts:
if not (txt.split(".")[-1] == "txt"): #Ignore extensions other than txt
continue
txt = os.path.join(readdir, txt)
print txt
fp = open(txt, "rb")
#Get the character code of the file
f_encode = chardet.detect(fp.read())["encoding"]
fp = open(txt, "rb")
lines = fp.readlines()
for line in lines:
#Convert to unicode
line_u = unicode(line, f_encode)
#Get the character name
char_name = line_u[:line_u.find(u"「")]
outfname = os.path.join(outdir, char_name + ".txt")
#Check if there is a file with the character name
if os.path.exists(outfname):
#If there is, in overwrite mode
outfp = open(outfname, "a")
else:
#If not, create a new one
outfp = open(outfname, "w")
#Extract only lines
line_format = line_u[line_u.find(u"「") + 1:line_u.find(u"」")] + "\n"
#Write lines
outfp.write(line_format.encode("utf-8"))
Read line by line from the text file, divide it into lines between the character name before " and
" `", and if the character name file already exists, open the overwrite mode file and add it. If not, a new one is created.
The generated file looks like this. I thought that Cessna was a character, but when I looked at the contents, it was the radio line of the person who was riding Cessna.
・
・
Aska.txt
Kaworu.txt
keel.txt
class.txt
Cessna.txt
Naoko.txt
・
・
The contents are like this.
Aska.txt
Haro-o, Misato!I was fine?
That's right. I'm getting more feminine in other places as well.
It's a viewing fee. It's cheap, isn't it?
What are you doing!
And which is the rumored third child?No way, now ...
Hoon, it's dull.
・
・
Only the lines are taken properly.
Now that we have the data for learning, we will start creating a program for learning the main subject.
There are several language models and sentence generation methods, but this time we will use a recurrent neural network (RNN). As a comparison method, we will also implement a method using Markov chains.
RNN is a general term for neural networks that have a cycle inside.
For example, as shown in this figure, the contents of the middle layer at time t are treated as input at the next time t + 1. This structure allows RNNs to temporarily store information and pass it on to the next input. This makes it possible to capture and process the "flow of time" that exists in the data. In this program, instead of the RNN node called Long short term memory (LSTM), Block that can hold the input value is adopted.
I'm sorry that it is difficult to understand if you explain it firmly because it will be very long, but for a detailed overview and easy-to-understand explanation, please see other people's materials.
-Predict time series data with neural network -[Outline of Recurrent Neural Network and Operating Principle (PDF)](http://wbawakate.jp/wp-content/uploads/2015/03/RNN%E3%83%95%E3%82%9A%E3%83 % AC% E3% 82% BB% E3% 82% 99% E3% 83% B3.pdf)
A Markov chain (Markov chain) is a Markov process, which is a type of stochastic process, in which the possible states are discrete (finite or countable) (discrete state Markov process). In particular, it often refers to discrete times (time is represented by subscripts) (there is also a continuous time Markov process, which is continuous in time). In the Markov chain, future behavior is determined only by the current value and is irrelevant to past behavior (Markov property). From Wikipedia
We will make this 3-gram and perform Markov chaining. 3-gram is a set of three words (characters) cut out from a certain character string.
For example
I wonder why boys are stupid and lewd!
If there is a sentence that says, this time, we will cut out each word using Mecab, so we will make 3 chunks by shifting the word one by one.
word | word | word |
---|---|---|
(BOS) | why | boy |
why | boy | What |
boy | What | 、 |
What | 、 | Ah |
、 | Ah | Stupid |
Ah | Stupid | so |
Stupid | so | Lewd |
so | Lewd | Nana |
Lewd | Nana | of |
Nana | of | I wonder |
of | I wonder | ! |
I wonder | ! | (EOS) |
BOS: Abbreviation for Begin Of Sentence EOS: Abbreviation for End Of Sentence
For a detailed explanation of Markov chains, please refer to other people's easy-to-understand materials.
-Introduction to Markov Chain Monte Carlo Method-1 -I tried to chain Markov -Automatic generation of sentences by Markov chain
RNN The library that supports RNN
And so on. Google's Tensorflow is popular, but I dare to use Chainer. (Made in Japan)
English sentence generation program using chainer yusuketomoto/chainer-char-rnn · GitHub It was created by modifying it based on.
I will briefly explain only the core part of the program.
CharRNN.In py
embed = F.EmbedID(n_vocab, n_units),
l1_x = F.Linear(n_units, 4*n_units),
l1_h = F.Linear(n_units, 4*n_units),
l2_h = F.Linear(n_units, 4*n_units),
l2_x = F.Linear(n_units, 4*n_units),
l3 = F.Linear(n_units, n_vocab),
In this part, the model is set. n_vocab is the number of word types in the string n_units is the number of units, and this time it is set to 128 for execution.
CharRNN.In py
def forward_one_step(self, x_data, y_data, state, train=True, dropout_ratio=0.5):
x = Variable(x_data.astype(np.int32), volatile=not train)
t = Variable(y_data.astype(np.int32), volatile=not train)
h0 = self.embed(x)
h1_in = self.l1_x(F.dropout(h0, ratio=dropout_ratio, train=train)) + self.l1_h(state['h1'])
c1, h1 = F.lstm(state['c1'], h1_in)
h2_in = self.l2_x(F.dropout(h1, ratio=dropout_ratio, train=train)) + self.l2_h(state['h2'])
c2, h2 = F.lstm(state['c2'], h2_in)
y = self.l3(F.dropout(h2, ratio=dropout_ratio, train=train))
state = {'c1': c1, 'h1': h1, 'c2': c2, 'h2': h2}
return state, F.softmax_cross_entropy(y, t)
This is the part related to one step during learning. Give batch size to x_data and y_data, The hidden layer is LSTM, and the output uses the softmax cross entropy function.
train.In py
def load_data(args):
vocab = {}
print ('%s/input.txt'% args.data_dir)
f_words = open('%s/input.txt' % args.data_dir, 'r')
mt = MeCab.Tagger('-Ochasen')
words = []
for line in f_words:
result = mt.parseToNode(line)
while result:
words.append(unicode(result.surface, 'utf-8'))
result = result.next
dataset = np.ndarray((len(words),), dtype=np.int32)
for i, word in enumerate(words):
if word not in vocab:
vocab[word] = len(vocab)
dataset[i] = vocab[word]
print 'corpus length:', len(words)
print 'vocab size:', len(vocab)
return dataset, words, vocab
In this part, when inputting and giving data, morphological analysis is performed with MeCab and it is given separately for each word.
There is also a sample code explanation of the recurrent neural network of the other chainer, so please refer to that as well. I tried to explain the sample code for creating a recurrent neural language model using Chainer
Created by another person I made an automatic sentence generation program for Python rehabilitation o-tomox/TextGenerator · GitHub I used it as it is. Thank you very much.
The flow of sentence generation is like this.
I tried to generate 20 sentences at a time automatically.
RNN
You!It's here. I like it.
No!
I'm dead on the contrary, already!Not only now!
So, close me!
I hate it
Stupid, wait.
Truthful, stupid, no!
Muu!
That's stupid!That's it.
What's wrong!
How about being in the above.
Hoon, it's dull.
Nasty!Anti-vomiting, stupid, wake-up machine is just a little person.
Hey ah ah ah!
That's a little.?I'm going to develop!
Let's go to Unit 2 each other.
Of course it doesn't work.
I won't kill you. I am. No one needs to hate it.
What is the commander, what is decided for me is to beat Eva to the stuffed animal!
** What is the commander, what you decide is to beat Eva to the stuffed animal! ** Asuka Kansai people theory
Let's go!
Unwilling to go?
I, Asuka, let's go, just do it comfortably without making the right arm clear?
Show me a little ... I can't solve this kind of formula!
Stop it anymore ... I wonder if the tamed man will snow tomorrow.!?
Ah, come early!
Thermal expansion?I don't need childish things anymore!
Stop saying this, mom!
I was chosen. Leave it in the cage.
I've been waiting!I've dropped the knife, I'm not relieved!
It's a sight-seeing fee to talk to you.
There is no other person.
I live alone like that!
There is no other person?
All right, I haven't heard it yet.
What did I do?
Yeah ... I can't help it anymore ...
Don't partition yourself, I'll go at full capacity from the beginning and at maximum speed.
I wonder what they are doing.
Hey, don't look ahead.
Some sentences appear as they are. Grammatically, there are many correct sentences here.
This time, the Markov chain gave better grammatical results. I haven't adjusted any parameters, so I haven't fully utilized the RNN. If there is a better way to generate sentences with RNN, I would like to ask you to teach.
I think there is still potential for automatic generation, so I'd like to expect it in the future.
Shinji and Rei also tried it.
RNN
Asuka!
But everyone hates me.
Thank you again today and I'm scared to throw it away!I'm ... I'm not.
I ... there are people I can be proud of, my dad was staring at me on the ice. But I wonder. Will not stop!
Yup. Yeah, my smell.
Yes, I wonder if I'm already on Eva ... from ... san. I don't need me in the world. I ride, but I'm from me!
I didn't run away before I was learning. I think I can do it!
that's right. I'm scared of my goal.
Yes, I'm fine with me.
I'm me, I'm mine?What is the world!
Rain, I don't know what to remember. Good!
Would you like to ride the waves?
Hmmm, I wonder if that world is really ... Can't you understand me with Katsuragi wave??
"I" collapses Gestalt
Who?I couldn't forgive you.
I'm not worth it I think it's good?
I think I know.
So Misato!
It's a lonely city ...?
Why do I have to make a fuss?
But I couldn't help it ... I want value, but all of them are real Shinji Ikari, Rei Ayanami!
to come. Quit!
I think I ran away!
Um ... how can everyone be happy with that ... dad?!
Ayanami. I'm doing my best.
The bad thing is that weapons that are no longer on Eva are useful.
It's pitch black, I don't want to eat supper ... I don't.
What are you talking about!
I try to understand, so I often remember difficult things. Otherwise, our enemies.
Funuuuuuuuu!
RNN
why?
You are together Blood container.
It's making?
You are, and you are lonely in your heart!
Your world is alive.
After all, I'm late!I'm late on the first day
Yes
You don't lie.
that's right. With ruin, freedom disappears.
I can't see myself without other people.
What do you hate, from disappearing. I am with everyone.
You are Rei Ayanami, your mother.
I really feel with you?
Operation Yashima says you've come to peep
So, Rei Ayanami's?
** After all, I'm late! I'm late on the first day ** It is output from the lines of that scene, but when it comes in this line, it feels strange. (Although the lines may be different)
... emergency call ... I'll go ahead.
It can't be helped.
It feels so good.
30(Ichinana Sanmaru), Gathering at Cage Rainy days are depressing.
But no, I want to return to nothing.
The rest is something I will protect, so everyone thinks so too.
I'm glad I'm sleeping?
for whom?
I have nothing.
Very strange.
Ikari commander now.
No, it feels the same as me.
Hmm ah ah!
However, there is nothing in you, you can't see it.
You can go home alone, so don't come in that style
Recommended Posts