When I tried to generate sentences based on Kafka's "transformation" with LSTM, I could not become anyone.

Overview

How accurate is the data set for sentence generation? I have never tried it, so I tried it.

Method

I wrote it in Tensorflow. I think Keras is simpler to write. .. .. A two-layer LSTM is installed, and char / word after one character is learned as teacher data. When generating sentences, char / word with the highest prediction accuracy is selected. If you want to make fluctuations, I think you can randomly select from the top predictions.

For dataset, I chose Kafka's "transformation" from the following URL. There is no particular meaning. http://www.gutenberg.org/ebooks/5200

The pre-processing only extracts the body part and deletes the chapter notation. I think it will be even more beautiful if you replace the symbols.

Click here for the code, https://github.com/pigooosuke/lstm_text_generator

result

Generates sentences starting with "The".

Char level

Number of vocabularies: 60 Total number of chars: 116622

The same times that I've open upingoull, and his fordit the sigh of the reases,
and no tho gontoused and he pusied hers so that he was to became use to be ard
forgove and he would save with a cramous snignto and if he wolld with hiseof ball
paysfrom a lots of his beadionabthraif asting wouldnewpatreary himself asliad for
hil", and pust ho ewairsutention with anverask so stant that spert flack as home
and he would know all the bost of what had been sudfert horrikeltsubpessitratthr ,
was a giran in mupid.- Gregor had never could be seen fuc  award efuss to bong
severy decisavely atthis way warall.

It seems that you can study the blank space properly. To be honest, I don't really understand the meaning.

Word level

Number of vocabularies: 4997 Total number of words: 20252 ↑ The total number of words may be unbalanced with respect to the vocabulary. .. ..

The room for her hand and she had her her came hands often her selfcontrolled, It
was very sign and the father would have to him but she had been more for her and
she had not very indifferent to him more tried to get and difficult that she would
used to get himself than he had been before. for bed Gregor had him for him to it;
he had back and it was thought that he would have to him for the time that she had
already again in this mother and could not time about this I not me to ask I
parents to the hall, that father only to the door as his mother would always put on
the key on the door with the gold slightly in the notes and day,

It looks decent compared to the char level, but I can't read the meaning ... By the way, if you pass the above through google translate,

A room for her hands and she had her She often she restrained her hands,
It was very impressive and my dad had to him,
She was for her and she was used to get herself more than before.
Gregor had it for him for the bed.
He came back and thought he had to be with him for the time he was doing to this mother again.
I didn't make me meet my parents.
Always put a note and a little money on the day to lock the door.

Hmm. If you're a Kafka enthusiast, would you say, "It's a Kafka-like phrase!"?

I can understand if the word length is very short, but ... ・ The father would have to ・ It was thought that he would have to ・ His mother would always put on the key on the door

Improvements

It ’s an idea level, ・ Since the predicted words are not narrowed down, predict from the determined vocabulary. ・ Pad the prediction sentence with a fixed length rather than a random length. -Generate multiple candidate sentences and adopt the ones with high cosine similarity of the preceding and following sentences. It may be a little decent, but I doubt if there is a dramatic improvement. Also, with classification, there are cases where accuracy is improved with 2gram-char and 3gram-char, so it may be interesting to try it. The above one is 1gram-char.

Interactive sentence generation is often mentioned, It's hard to simply generate a sentence.

Recommended Posts

When I tried to generate sentences based on Kafka's "transformation" with LSTM, I could not become anyone.
When I tried to use pip with python, I was told that XML_SetHashSalt could not be found.
Log when I was worried that I could not connect to Wi-Fi on Linux
I tried to implement Minesweeper on terminal with python
I tried to generate ObjectId (primary key) with pymongo
I tried to automatically generate a password with Python3
A memorandum when I tried to get it automatically with selenium
I tried cross-validation based on the grid search results with scikit-learn
I tried to get started with Bitcoin Systre on the weekend
I tried to display GUI on Mac with X Window System
I tried to summarize everyone's remarks on slack with wordcloud (Python)
Generate PowerPoint material for "I tried to sing with XX" [python-pptx]
I tried to make a strange quote for Jojo with LSTM
I tried summarizing sentences with summpy
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to implement deep learning that is not deep with only NumPy
I tried to draw a system configuration diagram with Diagrams on Docker
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried to implement CVAE with PyTorch
I tried to solve TSP with QAOA
When I tried to change the root password with ansible, I couldn't access it.
When I tried to make a VPC with AWS CDK but couldn't make it
When I tried to create a virtual environment with Python, it didn't work
When I tried to connect with SSH, I got a warning about free space.
When I tried to create a project using Python on Docker with PyCharm, it didn't work, but it worked with Docker Compose.