I want to handle the rhyme part1

Introduction

I like rap, so I always wanted to spread the word. "Rhythm" is an interesting and deep part. I tried to write the lyrics several times, but it didn't look right (it became like a pun, and the story was messy with the rhythm in mind). Then, if you enter the source of the lyrics, why not output something like "lyrics"? ?? I will try it. ~~ (Did you guys write love songs or lyrics in your youth? Lol.) ~~

Judgment of rhyme

Even if you say the rhyme in a bite, the pronunciation changes the sound ("so" and "soo"), and so on. This time, if the vowels [a, i, u, e, o] are in the same sequence, it is regarded as "rym". Romaji conversion is required, but [kakashi] [link](referred to here) makes it possible. For the time being, I will use the text data with the lyrics of one song of a certain rapper as the input data. ~~ (Speaking of "scarecrow", "Robo or Scarecrow average by KICK THE CAN CREW" comes to mind) ~~ [link]:https://crimnut.hateblo.jp/entry/2018/08/29/180455

from pykakasi import kakasi
import re

with open("gennama.txt","r") as f:
    data = f.read()

kakasi = kakasi()
kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')

conv = kakasi.getConverter()
filename = data
data = conv.do(filename)
data_re = re.sub(r"[^aeiou]+","",data)

print(data_re)#Output oeauiaau...

Quantification of rhyme

I converted it to Romaji with kakasi and extracted only the target [a, i, u, e, o] withre.sub (). Next, I'd like to divide the input data according to a certain rule and do some processing on the part where the vowels match, but I get lost here. It's hard to tell what you want to do and what is the "lyric". I had been worried about the shape of the input data and how to transform it, but I couldn't make it, so I thought about "quantifying the rhyme" for the time being. ~~ ("Everyone said that the evolved rhyme is good. In the Ginza Line by YOSHI (Gaki Ranger)" [In] sounds comfortable. I haven't thought about [n] this time) ~~

#Slice the shorter word, and if it is included in the other, consider it as a "rhin" and add its length as a score
def make_score(word_a, word_b):
    score = 0
    if len(word_a) > len(word_b):
        word_len = len(word_b)
        for i in range(word_len):
            for j in range(word_len + 1):
                if word_b[i:j] in word_a:
                    score += len(word_b[i:j])
    else:
        word_len = len(word_a)
        for i in range(word_len):
            for j in range(word_len + 1):
                if word_a[i:j] in word_b:
                    score += len(word_a[i:j])
    return score

I think I was able to express "quantification of rhyme" with this score. It was okay to score only the ending match, but I decided to slice the word. ~~ All Japanese verbs end with the sound "u". It's not good to just verb and rhyme ~~

What to output

By the way, the input data is still uncertain, but let's think about the output so far. The output is irrelevant, but it's a "lyric" one. .. It can be said that "quantification of rhyme" is similar between words. If you enter your favorite word in one word, it will recommend the one with high similarity from the data. The input data is the lyrics of a certain rapper, the division method is blank according to the lyrics card, and the previous ones are summarized under this condition. ~~ I think catchphrases and puns are also rhymes. Mostly homonyms. The finish during rap is also pleasant. I think that stepping on the rhyme strengthens the feeling of saying good things. ~~

from pykakasi import kakasi
import re

with open("./gennama.txt","r", encoding="utf-8") as f:
    data = f.read()
#This time "hoge hoge hoge...Because it is divided only by the space like "split"()Only pretreatment
data_sp = data.split()
target_word_origin = "Gennama"

kakasi = kakasi()

kakasi.setMode('H', 'a')
kakasi.setMode('K', 'a')
kakasi.setMode('J', 'a')

conv = kakasi.getConverter()
#Convert to romaji
target_word = conv.do(target_word_origin)
text_data = conv.do(data).split()
text_data = list(text_data)
#Leave only vowels
target_word_vo = re.sub(r"[^aeiou]+","",target_word)
vowel_data = [re.sub(r"[^aeiou]+","",text) for text in text_data]
#vowel_Create a dictionary so that you can see the data before vowel conversion by the data index.
dic = {k:v for k,v in enumerate(data_sp)}

#Slice the shorter word, and if it is included in the other, consider it as a "rhin" and add its length as a score
def make_score(word_a, word_b):
    score = 0
    if len(word_a) > len(word_b):
        word_len = len(word_b)
        for i in range(word_len):
            for j in range(word_len + 1):
                if word_b[i:j] in word_a:
                    score += len(word_b[i:j])
    else:
        word_len = len(word_a)
        for i in range(word_len):
            for j in range(word_len + 1):
                if word_a[i:j] in word_b:
                    score += len(word_a[i:j])
    return score
#Pass data with only vowels and arbitrary words. Get the index and score as a set so that you can understand the original words later.
def get_idx_score(vowel_data, target_word):
    ranking = []
    for i, word_b in enumerate(vowel_data):
        score = make_score(target_word, word_b)
        ranking.append([i, score])
    return sorted(ranking, key=lambda x:-x[1])

ranking = get_idx_score(vowel_data, target_word_vo)
print(target_word_origin)
for i in range(len(ranking)):
    idx = ranking[i][0]
    score = ranking[i][1]
    print("Score:" + str(score))
    print("word:" + dic[idx])

I used the completed lyrics as the input data, but even if I use the word in the lyrics for target_word, the part that is lingering does not always come to the top. That should be the case, and the method of dividing the input data is appropriate. However, I was able to think more concretely about the unreasonable part because the outline was created. Also, the problem became clear.

Impressions, future development

I could see the direction. The input is a list of my own words, and the output is a feeling that recommends words that can be rhymed from the input as I did this time. By doing this, I think it's interesting to know that you can linger in your own words. Also, if you want to write lyrics like yourself, I think that the central part of the lyrics should be completed if you write down what you want to say without worrying about the lyrics. In the future, it will be necessary to improve the method of dividing the input data into words and the scoring by quantifying the rhyme. I want to have the freedom of inputting my own words (I want to keep dialects and unique phrases), so I will try various things. (Simply extracting by part of speech does not seem to produce the desired result.)

in conclusion

* Below memorandum I'm a beginner in programming. After receiving online learning, when I tried to make something, I tried to use everything I learned, scraping the input data and morphologically analyzing it ... I thought that the output would be using the completed lyrics, LSTM, etc. Nothing went on. As mentioned in the text, the input and output were unreasonable, and what I wanted to do was not concrete, so I threw it out. I wanted something I made for the time being, so I tried to squeeze someone's coding, but I couldn't find anyone doing the same thing. (A person who wants to step on the original rhyme was trying to plagiarize ...) What changed the flow this time was "content-based filtering of recommendations" when reviewing online learning, and if the similarity was set to "quantification of rhyme", it would be possible to make recommendations. I suddenly thought. Actually, I was still paying attention to the similarity, but I couldn't think of how to use it. It seems easy, so I tried it, but it didn't produce the desired output. Here, it became clear what was the point. Furthermore, I thought that the input data could be a list of words like the lyrics used this time. What I want to say is that it is important to first create a general framework like this one, the data to be used should be something that can roughly predict the output, and the usage of the completed product can be considered later. There are also links to things that you don't think are unnecessary. When I was less motivated to study, I felt like remembering it.

I want to handle the rhyme part1

__ Introduction __

__ Judgment of rhyme __

__ Quantification of rhyme __

__ What to output __