How accurate is the data set for sentence generation? I have never tried it, so I tried it.
I wrote it in Tensorflow. I think Keras is simpler to write. .. .. A two-layer LSTM is installed, and char / word after one character is learned as teacher data. When generating sentences, char / word with the highest prediction accuracy is selected. If you want to make fluctuations, I think you can randomly select from the top predictions.
For dataset, I chose Kafka's "transformation" from the following URL. There is no particular meaning. http://www.gutenberg.org/ebooks/5200
The pre-processing only extracts the body part and deletes the chapter notation. I think it will be even more beautiful if you replace the symbols.
Click here for the code, https://github.com/pigooosuke/lstm_text_generator
Generates sentences starting with "The".
Number of vocabularies: 60 Total number of chars: 116622
The same times that I've open upingoull, and his fordit the sigh of the reases,
and no tho gontoused and he pusied hers so that he was to became use to be ard
forgove and he would save with a cramous snignto and if he wolld with hiseof ball
paysfrom a lots of his beadionabthraif asting wouldnewpatreary himself asliad for
hil", and pust ho ewairsutention with anverask so stant that spert flack as home
and he would know all the bost of what had been sudfert horrikeltsubpessitratthr ,
was a giran in mupid.- Gregor had never could be seen fuc award efuss to bong
severy decisavely atthis way warall.
It seems that you can study the blank space properly. To be honest, I don't really understand the meaning.
Number of vocabularies: 4997 Total number of words: 20252 ↑ The total number of words may be unbalanced with respect to the vocabulary. .. ..
The room for her hand and she had her her came hands often her selfcontrolled, It
was very sign and the father would have to him but she had been more for her and
she had not very indifferent to him more tried to get and difficult that she would
used to get himself than he had been before. for bed Gregor had him for him to it;
he had back and it was thought that he would have to him for the time that she had
already again in this mother and could not time about this I not me to ask I
parents to the hall, that father only to the door as his mother would always put on
the key on the door with the gold slightly in the notes and day,
It looks decent compared to the char level, but I can't read the meaning ... By the way, if you pass the above through google translate,
A room for her hands and she had her She often she restrained her hands,
It was very impressive and my dad had to him,
She was for her and she was used to get herself more than before.
Gregor had it for him for the bed.
He came back and thought he had to be with him for the time he was doing to this mother again.
I didn't make me meet my parents.
Always put a note and a little money on the day to lock the door.
Hmm. If you're a Kafka enthusiast, would you say, "It's a Kafka-like phrase!"?
I can understand if the word length is very short, but ... ・ The father would have to ・ It was thought that he would have to ・ His mother would always put on the key on the door
It ’s an idea level, ・ Since the predicted words are not narrowed down, predict from the determined vocabulary. ・ Pad the prediction sentence with a fixed length rather than a random length. -Generate multiple candidate sentences and adopt the ones with high cosine similarity of the preceding and following sentences. It may be a little decent, but I doubt if there is a dramatic improvement. Also, with classification, there are cases where accuracy is improved with 2gram-char and 3gram-char, so it may be interesting to try it. The above one is 1gram-char.
Interactive sentence generation is often mentioned, It's hard to simply generate a sentence.
Recommended Posts