Easy generation of stylistic pakuri sentences with MeCab + gensim

Recently, there was a day when this year's Advent calendar, which is an interesting natural language processing, was left open, so I thought I should take a look at it, so I did it by express.

By the way, the content and results of what I'm doing are pretty terrible, so just for reference, "This is what happened."

Development intent

I've been trying this and that in detail, thinking that it would be convenient if the text could be automatically generated.

--Machine learning is long and heavy learning time ――Although it took a long time, the generated sentences are confused as Japanese.

I was frustrated by hitting two big walls.

For the former, give up learning on your own and use a learned model! That's the solution, but the latter just doesn't work with the trained model.

So, "Learning English when I was a student was not just reading and writing, but learning grammar at the same time, and even in machine learning, rather than just reading sentences and learning, the sentences to be generated I came up with the idea, "Isn't it necessary to teach the grammar of?" I came up with the worst idea, "Isn't it possible to write?"

code


import MeCab
import gensim
import re


mecab = MeCab.Tagger ("-Ochasen")
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec', binary=False)

morpheme = mecab.parse("Original text of plagiarism")
word_morphemes = morpheme.split("\n")
original = []
pakuri = []
for word_morpheme in word_morphemes:
    if word_morpheme == "EOS" or word_morpheme == "":
        continue

    word_morpheme_info = word_morpheme.split("\t")

    word = word_morpheme_info[0]
    category = word_morpheme_info[3]

    original.append(word)

    if re.findall('^[noun|adjective].*', category):
    # if re.findall('^[noun|adjective|Verb \-Independence].*', category):
        try:
            similars = model.most_similar(positive=[word])
            pakuri.append(similars[len(similars) - 1][0])
        except:
            pakuri.append(word)

    else:
        pakuri.append(word)

print("".join(original))
print("".join(pakuri))

I thought that replacing particles and adverbs in the components of a sentence would increase the rate of sentence breakage, so I decided to basically target nouns, adjectives, and verbs.

Also, in order to make the sentence more meaningful even if it is replaced with "similar words", it may be better not to replace the verbs. I somehow thought, so I tried both when the verbs were replaced and when they weren't.

I borrowed the model I'm loading from the following.

The trained model of fastText has been released

In addition, in the above article, both the one with NEologd and the one without it were published, so I also compared what happens when both are used.

Execution result

It turned out to be something like this.

Original text (Natsume Soseki, Ten Nights of Dreams)

It is based on the copy and paste of the contents of Aozora Bunko, but some kanji are opened.

When she sits at the bedside with her arms folded, the woman lying on her back dies in a quiet voice. The woman has long hair laid on a pillow and a soft-contoured melon face lying in it. Warm blood color is moderately applied to the bottom of the pure white cheeks, and the color of the lips is of course red. It doesn't look like it's going to die. But in a quiet voice, she clearly stated that she would die. I definitely thought this wouldn't die. So, I asked him if he was going to die, looking into it from above. The woman opened her eyes tightly, saying that she would die. With big moisturized eyes and long eyelashes, it was all black. Behind the black fox, I can see myself vividly.

Replace nouns and adjectives

When sitting with his arms folded, the concubine who slept on his back is said to die in a relaxed mood. The concubine lays a thin tie on the bolsterless and lays a soft black melon face at that time. The ground color of the gastric distension is moderately different from that of the colorful neck, and the ground color of the eyelids is of course green. It doesn't look like it's going to die. However, the concubine was uncomfortable and muffled, and clearly stated that he would die. I thought you wouldn't even die forgiveness. So, yeah, I'm dying and wondering, but I looked into it from the proviso and asked. The concubine opened the cornea tightly, saying that he would die. It was a cornea with a large enhancement, and when it was wrapped in thin eyelashes, it was just six lines different. Your appearance is floating in silver gray on the Tataragi tree of Shukotsu Togawa.

It's incoherent in various ways, but it feels strangely good that "Your manifestation is floating in silver gray on the Tataragi tree of the same Same Togawa Shukotsu." It seems to be in some kind of light novel. (prejudice)

The eyelids are green, but I feel that a sense of science fiction is emerging along with the fact that "it doesn't look like it's going to die".

Replace nouns, adjectives, verbs

I have no choice but to sit down with my arms folded, and I'm a concubine in the middle of the night. It is bolsterless and laid out for a thin concubine, and because it has a black and soft face, it is hard to grasp at that time. Ayaka A variety of necks and guides with warm gastric distension and ground color but moderately Tohsen Jordan, eyelids and ground color such green. I never see it dying. In addition, there are various concubines that are relaxing and concubine, and there are certainly few who will die. Forgive you and others, and there are various feelings of death. Immediately, she died, and if anything, she was peeping into me. I'm dead, and although it's here, there's also a concubine and a tight cornea. Large enhancement and many corneas Also, there are times when thin eyelashes drift, and there are six lines of difference, Same again. Its Same Various Togawa Shukotsu and Tataragi, you and the manifestation but silver gray do not lie down.

Evaluate the sense of rhythm.

Is it a scene of some classic (unreadable) battle? Impression.

Replace nouns and adjectives (using NEologd version)

When he sits awake with his arms folded, his wife, who lies on her back, dies with a peaceful cry. My wife laid a long tie on the beam and laid a lustrous melon face with gradation at that time. The orange color of the gentle bloodline is moderately applied to the surface layer of the lips, and the orange color of the narrow vowel is of course pink. It doesn't look like it's dead. But his wife screamed peacefully, clearly stating that she would die. Others also pushed forward and thought that it would not die. So, yeah, I asked him if he was going to die, so I tried to get inside the skirt. My wife opened her eyelids tightly, saying that she would die. When it was wrapped in long eyelashes with a large, rich eyelid, it was just a vena cava on line 08. Beside the vena cava second-class person, another person's sword is secretly floating.

The last sentence of this is also strangely famous. I don't know what it means.

When I said "I'm going to die" with a peaceful cry, my wife said [Kudan](https://www.google.com/search?client=safari&rls=en&biw=1621&bih=829&tbm=isch&sa=1&ei=e_vpXa3PDJuRr7wP_smHqAg&q= E3% 81% 8F% E3% 81% A0% E3% 82% 93 & oq =% E3% 81% 8F% E3% 81% A0% E3% 82% 93 & gs_l = img.3..0l2j0i4l8.0.0..3074 .. .0.0..0.218.283.1j0j1 ...... 0 ...... gws-wiz-img.yX0KZZBFy8s & ved = 0ahUKEwjt-LLMuKDmAhWbyIsBHf7kAYUQ4dUDCAY & uact = 5) Is it? I tought.

Replace nouns, adjectives, and verbs (using NEologd version)

Arms folded I woke up and lay down, and my wife was lying down, and she said she was crying and committed suicide. Wife) A long-sized tie, a beam, and a melon-like face with gradation and shine. There is a lips called Mashiro, and there is a gentle bloodline on the surface. The orange color has a moderate difference in cracks, and there is a narrow vowel. Ma Rong Suicide) A glimpse. After that, my wife) screamed and the various theories that she committed suicide by herself. The way to live (by pushing forward with others). It happened that I was living alone, and I was watching to listen to it because I was immersing myself in the skirt. When I'm suicide, my wife) can't open my eyelids. There is a great deal of wealth, and the eyelids and long eyelashes are sometimes wrapped), or the vena cava on the 808 line. There is a second-class person called the vena cava, and there is another person, and you can swim quietly.

Finally, the symbol is mixed in the text. Already Akan.

Even in such a case, the dignity of the mysterious masterpiece floating in the last sentence. (can not read)

Looking back on the results

――I am satisfied with it. ――I think it was better not to replace the verbs. I feel that it was easier to maintain the cohesiveness and atmosphere as a whole if the verbs were left as they were. ――The original text was a sentence that happened to have few proper nouns and new words, but it was surprising that the results were quite different between the NEologd version and the non-NEologd version. ――When selecting the words to be replaced, I feel that better sentences can be made by including conditional judgments such as matching the part of speech and endings with the original words. ――There is room for improvement in other things, but the loading of the trained model is ** anyway heavy **, which hindered trying various things. If I don't devise here, it will probably be difficult in the future, so I want to do something about it. I wonder if I can put the loaded ones on standby somewhere ...

Thank you for reading!

Postscript

--2019-11-09 The notation of NEologd version and non-NEologd version was reversed, so it has been corrected. I'm sorry! --2019-11-10 ... I thought it wasn't the other way around, so I put it back. I'm really sorry. --2019-11-10 I'm sorry, I made a mistake in the regular expression of the part of speech judgment. The parentheses are () instead of [], and the backslash is a \ mark. If you fix this, it will work a little more properly.

I regret that I'm not really in a hurry.

Recommended Posts

Easy generation of stylistic pakuri sentences with MeCab + gensim
[Blender x Python] Think of code with symbols
Easy generation of stylistic pakuri sentences with MeCab + gensim
Convert sentences to vectors with gensim
Easy introduction of speech recognition with Python
Easy! Use gensim and word2vec with MAMP.