Introduction

The title is "What is this person saying ..?" (Laughs) .. I tried to make it using 4 consecutive holidays as well as studying natural language processing. In the near future, I'll make it available on the Web somewhere.

Concept and completed image

The moment I found this story, I wrote down the current situation (As is) → issues → what it should be (To be). As expected, a businessman (laughs)

The following mechanism came up when I thought about how to make it. ↓ Internally create Mr. Children lyrics dataset and convert to Word2Vec. Word2Vec processing is performed on my emotions as well, and similar lyrics are pulled by cos similarity.

PoC was done!

I tried quickly to realize the above overall picture. The result ... ** For my feelings that "I can't sleep" ** ** "Become a member of society and carry the burden on my back-to the person who shines light" ** is the first I came back to. Eh ... I read it so deeply ... lol

What you are using

Morphological analysis: janome.tokenizer Word2Vec: word2vec in gensim.models

from janome.tokenizer import Tokenizer
from gensim.models import word2vec

The lyrics are shredded by morphological analysis, and Word2Vec is used for each word. Finally, by getting the average of the vectors, Word2Vec with one set of lyrics is completed.

↓ Results of morphological analysis

Word2Vec part of the text

# skip-gram Mr.Children's lyrics(sentences)So, make a w2v model.
skipgram_model = word2vec.Word2Vec(sentences,
                                   sg=1,
                                   size=250,
                                   min_count=2,
                                   window=10, seed=1234)



#Do Word2Vec for each word that has been morphologically analyzed, and finally average the function => Can Word2Vec reflect the context of the lyrics?
def avg_document_vector(data, num_features):
    document_vec = np.zeros((len(data), num_features))
    for i, doc_word_list in enumerate(data):
        feature_vec = np.zeros((num_features,), dtype="float32")
        for word in doc_word_list:
            try:
                feature_vec = np.add(
                    feature_vec, skipgram_model.wv.__getitem__(word))
            except:
                pass

        feature_vec = np.divide(feature_vec, len(doc_word_list))
        document_vec[i] = feature_vec
    return document_vec

in conclusion

I found it interesting to convert words into vectors and see the degree of agreement. I want to study BERT as well. There is an urgent need to expand the number of songs in order to make this play a service. (As of July 29, 2020: 5 songs .. lol) I will continue to accumulate songs steadily.

Even so, I'm glad that I've become able to play this kind of play during the four consecutive holidays, as it seems that my skills are coming along! !!

Recommended Posts

I want to express my feelings with the lyrics of Mr. Children

I want to check the position of my face with OpenCV!

I want to stop the automatic deletion of the tmp area with RHEL7

I want to customize the appearance of zabbix

I tried to vectorize the lyrics of Hinatazaka46!

I made you to express the end of the IP address with L Chika

I want to use PyTorch to generate something like the lyrics of Japari Park

I want to grep the execution result of strace

I want to inherit to the back with python dataclass

I want to increase the security of ssh connections

I want to plot the location information of GTFS Realtime on Jupyter! (With balloon)