The title is "What is this person saying ..?" (Laughs) .. I tried to make it using 4 consecutive holidays as well as studying natural language processing. In the near future, I'll make it available on the Web somewhere.
The moment I found this story, I wrote down the current situation (As is) → issues → what it should be (To be). As expected, a businessman (laughs)
The following mechanism came up when I thought about how to make it. ↓ Internally create Mr. Children lyrics dataset and convert to Word2Vec. Word2Vec processing is performed on my emotions as well, and similar lyrics are pulled by cos similarity.
I tried quickly to realize the above overall picture. The result ... ** For my feelings that "I can't sleep" ** ** "Become a member of society and carry the burden on my back-to the person who shines light" ** is the first I came back to. Eh ... I read it so deeply ... lol
Morphological analysis: janome.tokenizer Word2Vec: word2vec in gensim.models
from janome.tokenizer import Tokenizer
from gensim.models import word2vec
The lyrics are shredded by morphological analysis, and Word2Vec is used for each word. Finally, by getting the average of the vectors, Word2Vec with one set of lyrics is completed.
↓ Results of morphological analysisWord2Vec part of the text
# skip-gram Mr.Children's lyrics(sentences)So, make a w2v model.
skipgram_model = word2vec.Word2Vec(sentences,
sg=1,
size=250,
min_count=2,
window=10, seed=1234)
#Do Word2Vec for each word that has been morphologically analyzed, and finally average the function => Can Word2Vec reflect the context of the lyrics?
def avg_document_vector(data, num_features):
document_vec = np.zeros((len(data), num_features))
for i, doc_word_list in enumerate(data):
feature_vec = np.zeros((num_features,), dtype="float32")
for word in doc_word_list:
try:
feature_vec = np.add(
feature_vec, skipgram_model.wv.__getitem__(word))
except:
pass
feature_vec = np.divide(feature_vec, len(doc_word_list))
document_vec[i] = feature_vec
return document_vec
I found it interesting to convert words into vectors and see the degree of agreement. I want to study BERT as well. There is an urgent need to expand the number of songs in order to make this play a service. (As of July 29, 2020: 5 songs .. lol) I will continue to accumulate songs steadily.
Even so, I'm glad that I've become able to play this kind of play during the four consecutive holidays, as it seems that my skills are coming along! !!
Recommended Posts