I tried machine learning to convert sentences into XX style

Overview

This text is a record of an algorithm that transforms any text into any author-like style. In recent years, a very interesting application called Prisma has been released for image processing. This is a service that converts arbitrary images into Van Gogh style or Cubist style. The latest machine learning was a very interesting example of what was actually used in the service. This sentence was developed with the aim of "converting an arbitrary sentence into XX style" instead of the work "converting an arbitrary image into XX style" performed by Prisma. For example

** "You are a fool." **

If you make the sentence "Shikinami Asuka style"

** "Are you stupid?" **

And when it changes to Hoshino Ruri style,

** "Idiot." **

May become. To give another example,

"I am a cat."

If you convert the sentence to Natsume Soseki style,

** "I am a cat." **

It may be a conversion. The text chapter tried an approach to achieve them, but ** in fact it didn't work that well. ** ** So, although it is a record of failure, if you are still good, please see the following.

Previous research

As shown in the overview, the application called Prisma is very famous in the image field. This is predicted to be an extension of Deep Learning using CNN. In addition, there is voice quality conversion. It is easy to imagine the bow tie type voice changer that appears in Detective Conan, but it is a machine that can change the voice to the so-called voice of another person. Such an approach is achieved by extracting and separating the features of the voice called mel cepstrum and resynthesizing the features of others. On the other hand, we turn our attention to natural language processing. It seems that there are not many such studies in Japanese language processing. A famous example on the net is Monjiro (http://monjiro.net/). This is a site that can convert arbitrary sentences into XX style. For example, if you convert "You are a fool" into a samurai style, it will be converted into the sentence "You are a fool." Another site is ClalisTone (https://liplis.mine.nu/lipliswiki/webroot/?ClalisTone4.1). The API is open to the public, and if you read the specifications, you will find the following description.

① name Setting name(Any)
② type Setting type(0:Normal conversion, 1:End conversion)
③ befor conversion target character string
④ after conversion word character string

From this, we can see that this kind of algorithm can be divided into two. The first is "conversion at any place" and the second is "tail conversion". Taking the samurai word as an example, the character string "you are" corresponds to the conversion of the former, the conversion of "you are", and the last part of "is" is the conversion of "gozaru". You can see that it corresponds to the conversion. However, these approaches are problematic. The substitution is so simple that it cannot be converted like the formula wave Asuka-style "You stupid?". In addition, the writing style that can be converted at the end is limited to colloquialism. For example, considering the conversion to end with "Nya" like the so-called cat language, the sentence "The train has arrived" becomes the style of "The train has arrived". It turns out that it is a method that cannot be applied to things like novels that deal with. Here, in this text, let us consider ** style conversion that can also be used in written language **.

Target Example
Colloquial Monjiro, ClalisTone
Literary language This method

Basic concept of algorithm

Here, we describe a style conversion method such as Prisma. Roughly speaking, it can be divided into the "basic elements" and "style" of the painting. Although I will explain only briefly, the image is generated from the part related to "shape" and "composition" of the "basic element" and the part shown here, and optimized to suit the "style". It is important to divide into two elements in this way.

image

Another example of voice conversion is

image

It becomes like that. Although the pitch and length of the sound are the same, the voice is converted by changing the voice quality. MIDI may be more intuitive and easy to understand. MIDI basically adjusts the pitch and length, and you can easily change the timbre later. Even with the same musical score, you can play the sound with the trumpet or the piano.

Here, the style conversion dealt with in this text can be divided as follows.

image

The "basic element" has "the meaning of the sentence", while the "style" has "word selection / tone". it is conceivable that. In the previous example, the sentence "You are a fool." And the sentence "You are stupid?" Have the same ** meaning **, but only ** word selection and tone **. You can see that.

algorithm

The general voice quality conversion flow is as follows. (The content is simplified for the convenience of the figure)

image

In this way, the voice quality and pitch are separated in 1., replaced with another person's voice quality in 2., and resynthesized in 3. It becomes the method. Image style conversion takes a similar approach, but it is difficult to apply this method to text style conversion. The reason is that the method of separating "meaning" and "word selection" in sentences has not been studied, and the same approach is considered to be very difficult. This limited approach is the "end-to-end tone conversion" shown earlier, and if it is the last word, it will not affect the meaning so much. Under the assumption, it is thought that it will be a method of adding the character "nya" to the end of the word. Here, we will change the style of the text from a different perspective.

image

  1. Generate sentences from the sentence generation model
  2. Compare the meaning of the sentence you want to convert (original sentence) with the sentence generated in 1.
    1. Output only sentences with the closest meaning

It is a mechanism. The automatic generation of XX-style sentences includes the automatic generation by the hidden Markov model, which was very popular for a while. Using this, generate a sentence in 1. A method called Doc2Vec is used for semantic comparison. The method called word2vec was popular before, but it is an extended version of this method that allows you to compare the meaning of sentences with neural networks. We will use this in 2.

Data preparation

The corpus itself used Aozora Bunko. You can download the data for each author from the following site.

Aozora Bunko batch download by writer http://keison.sakura.ne.jp/

Then, the following 50 sentences were used in Mecab to generate a word-separator for learning Doc2Vec.

Andersen, Kafka, Grimm, Gogol, Jean Christoph, Dante, Chehoff, Doyle, Baudelaire, Po, MacLeod, Mopassan, Rilke, Loran, Victor Hugo, Ango's New Japan Geography, Ango Life Guide, Itami Mansaku, Ito Sachio, Ito Noe, Inoue Enryo, Inoue Koume, Nagai Kafu, Yokomitsu Riichi, Okamoto Kanoko, Okamoto Kido, Okino Iwasaburo, Shimomura Chiaki, Natsume Soseki, My View of Life, Unno Juzo, Akutagawa Ryunosuke, Kajii Motojiro, Kasai Zenzo, Kambara Ariaki, Kishida Kunishi, Kikuchi Hiroshi, Yoshiyuki Eisuke, Yoshikawa Eiji, Hisao Juran, Miyahara Koichiro, Miyagi Michio, Miyazawa Kenji, Miyamoto Yuriko, Chikamatsu Akie, Kuwahara Kazuzo, Harada Yoshito, Hara Tamiki, Furukawa Midori, Tosaka Jun

Some of them are not personal names, but the text is probably not processed due to an error.

Pre-experiment

We have prepared the following script.

estimate.py


#coding: utf-8
import sys
from gensim.models.doc2vec import Doc2Vec,DocvecsArray
from scipy.spatial.distance import cosine
from scipy.linalg import norm
import MeCab

class Estimator:
    def __init__(self,model):
        self.model = model
        self.mecab = MeCab.Tagger("-Owakati")

    def estimate(self,txt1,txt2):
        txt1 = self.mecab.parse(txt1)
        txt2 = self.mecab.parse(txt2)

        a1 = txt1.decode("utf-8").split()
        b1 = txt2.decode("utf-8").split()
        #return self.model.docvecs.similarity_unseen_docs(self.model,a1,b1,alpha=0.0,min_alpha=0.0)
        return self.model.docvecs.similarity_unseen_docs(self.model,a1,b1)
        

if __name__=="__main__":
    model_filename = sys.argv[1]
    txt1 = sys.argv[2]
    txt2 = sys.argv[3]
    
    model = Doc2Vec.load(model_filename)

    estimator = Estimator(model)

    print estimator.estimate(txt1,txt2)

Use this script

$ python estimate.py [Doc2Vec model name] "[Sentence 1]" "[Sentence 2]"

In the form of, you can measure the similarity between sentence 1 and sentence 2. (The code of the learning part of Doc2vec is omitted.) As a result of experimenting with this,

kotauchisunsun-VirtualBox$ python estimate.py nda6.model "I am a cat." "I am a cat."
0.667711547669
kotauchisunsun-VirtualBox$ python estimate.py nda6.model "I am a cat." "I am a cat."
0.805627031659

(^ ω ^) ...?

After trying it several times, I noticed that the similarity of gensim is normalized by -1 to 1, and the closer it is to 1, the closer the meaning of the sentence seems to be. However, there are many points that I can't understand, and I didn't understand it well because it behaved as ** the similarity did not reach 1.0 even if the same sentence was entered **. Probably, I feel like the specification of similarity_unseen_doc. After that, the evaluation value was not stable, and when the same sentence was evaluated three times, the value gradually decreased. .. ..

Style conversion experiment

At the time of preparation, I have only a bad feeling, but let's actually generate it automatically. The input is "I am a cat."

0.409022161426 "Most of this case is what your wife was"
0.480818568261 "How boring it became."
0.559027353215 "He lost his eyes."
0.659924672818 "That's me—that person and there"
0.746559396136 "He is a husband"
0.781097738802 "It was just this."
0.786329416341 "I'm alone."
0.802889606127 "I met today."
0.818648043016 "It's mine."

Obviously, it is randomly generated, so it actually takes a lot of effort just to assign the word "I". But what about the sentence "He is a husband" is more similar than the sentence "I am a cat" ...? The question is stronger. Now, let's do a control experiment to divide the problem. First, I modified the HMM code so that the word "I" was always selected, and automatically generated it. This worked relatively well, so I'm running it for a long time. This time, it is not generated completely randomly by HMM, but it is generated 10,000 times, the words with high scores on average are determined, and the next word is tried. The generation method is taken. I am doing something close to Monte Carlo tree search.

0.0880615096769 Early autumn, without my feathers
0.7469903088 I was destroyed.
0.759754754423 I have one.
0.780031448973 I'm not my girlfriend.
0.79565634711 It's me.
0.834694349115 I said it was my cat.
0.838936420823 Are you a cat?
0.855492222823 Are you a cat?
0.85682326727 It's my cat.
0.86722014861 I'm not a cat
0.878538129172 I can't do it with my cat.

Ah, it ’s delicious. I'm asking, "Are you a cat?" After that, he says "It's my cat." After that, when I thought, "I'm not a cat, I'm myself" or something like "I can't do it," I think you're a cat. When As you can see, there seems to be no bug around automatic generation. Therefore, it seems that a sentence similar to the sentence "I am a cat" can be generated for the time being. However, there is a part where the meaning comparison of sentences is not good, and it is possible that the conversion is not done well. On the other hand, if the number of servers is increased, the number of trials will increase, and an accurate answer will be given. In the above example, "I am a cat.", And although the trial was almost finished and there were almost no options beyond that, "I am a cat." Looking at the fact that I am deriving the answer, "I am the one who says nothing," I feel that it is wise to improve the accuracy of sentence meaning comparison.

Summary

We devised and implemented an algorithm for text style conversion. However, it was not possible to obtain a high-precision conversion, and it was possible to obtain an approximate solution to the originally planned solution by manually controlling some words.

Application example

It may be a good way to add personality to your chatbot.

[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning http://qiita.com/S346/items/24e875e3c5ac58f55810

There is an article called "Deep Learning", and we are conducting an experiment to generate sentences like Asuka Langley Soryu. Here, to the data used

Things to love-Introduction of anime lines- http://lovegundam.dtiblog.com/blog-category-7.html

There seems to be a site that summarizes the lines of all Evangelion characters. If the accuracy of this method can be improved, it may be interesting to say that ** personal assistants, who could only receive and answer like Siri, can be retrofitted with individuality from the character script data **.

Impressions

I felt that I could improve the accuracy a little more, but the time was up. I was very annoyed by tuning Doc2Vec, and in fact it is rarely used. That was a problem. Therefore, I could only understand how to use gensim's Doc2Vec, and it took a lot of time to move and verify the code one by one. The original story I came up with is this. [June 19th is Sakura Momo] Osamu Dazai revived by programming (http://pdmagazine.jp/people/dazai-program/). When I saw this, I wondered if I could do something more interesting. Actually, I personally wanted to incorporate the meaning comparison of the BM25, which had been adopted, but should it still be in fashion? Thinking again, I tried Doc2Vec from gensim. After that, it seems that the performance of Mecab is also bad. This is because the text in Aozora Bunko is 50 years before the copyright expired, so the wording is a bit old-fashioned. As a result of visually checking how many pieces, there were some places where the division was strange, so it is strict with Mecab, which is based on modern language. That also seems to affect the accuracy of machine learning. Recently, hostile learning such as GAN is also popular, but I still want correspondence data for learning. Without this, machine learning is difficult to reach in some places. I personally try it and feel like it's done somehow. The current situation is that the accuracy does not improve so much because it makes something like that. I've become like this, but the second and third natural language processing enthusiasts will do their best.

Recommended Posts

I tried machine learning to convert sentences into XX style
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to compress the image using machine learning
I tried machine learning with liblinear
[Machine learning] I tried to summarize the theory of Adaboost
I wanted to convert my face photo into a Yuyushiki style.
[Machine learning] I tried to do something like passing an image
I installed Python 3.5.1 to study machine learning
Mayungo's Python Learning Episode 6: I tried to convert a character string to a number
I tried to classify guitar chords in real time using machine learning
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to predict the presence or absence of snow by machine learning.
I tried to predict the change in snowfall for 2 years by machine learning
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried to process and transform the image and expand the data for machine learning
I tried deep learning
I tried to debug.
I tried to paste
Introduction to machine learning
I tried to put pytest into the actual battle
I tried to linguistically analyze Karen Takizawa's incomprehensible sentences.
[Deep Learning from scratch] I tried to explain Dropout
[Keras] I tried to solve a donut-type region classification problem by machine learning [Study]
I tried to build an environment for machine learning with Python (Mac OS X)
Uncle SE with hardened brain tried to study machine learning
I tried to learn PredNet
I tried to implement anomaly detection by sparse structure learning
An introduction to machine learning
Machine learning beginners tried RBM
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to organize SVM.
[Markov chain] I tried to read negative emotions into Python.
Matching app I tried to take statistics of strong people & tried to create a machine learning model
I tried to implement PCANet
[Markov chain] I tried to read a quote into Python.
I tried using Tensorboard, a visualization tool for machine learning
I tried to verify the yin and yang classification of Hololive members by machine learning
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to reintroduce Linux
I tried to implement ListNet of rank learning with Chainer
[TF] I tried to visualize the learning result using Tensorboard
I tried to introduce Pylint
I tried to summarize useful knowledge when developing / operating machine learning services [Python x Azure]
I tried to summarize SparseMatrix
I implemented Extreme learning machine
Super introduction to machine learning
I tried to divide the file into folders with Python
I tried to translate English subtitles into Japanese with Udemy
I tried to touch jupyter
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
I tried to implement StarGAN (1)
I tried to divide with a deep learning language model
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
[Qiita API] [Statistics • Machine learning] I tried to summarize and analyze the articles posted so far.
I was frustrated by Kaggle, so I tried to find a good rental property by scraping & machine learning
I tried HR Tech to develop an expert search engine by machine learning in-house meeting information
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~