I want to handle the rhyme part5

__ Content __

Since I made a graph last time, I will try clustering etc. In addition, the way of capturing the rhyme is expanded. It is converted from "e i" to "ee" and "o u" to "oo", and the one with the same vowel after conversion is also regarded as "rimp". This was based on Katakana English and Japanese, which is easy for children to make mistakes. Not writing "ei" means that "eki" does not become "yes". "I" and "u" must be vowels alone. ("Reizoko" is "Reizoko")

__ Graph operation __

import networkx as nx
import matplotlib.pyplot as plt
import community

G = nx.Graph()
G.add_weighted_edges_from(edge_list)
#Clustering
partition = community.best_partition(G, weight="weight")
#Separate nodes for each community into a list.[[Community 0 node list],[Community 1 node list]…]
part_sub = [[] for _ in set(list(partition.values()))]
for key in partition.keys():
    part_sub[partition[key]].append(key)
#List the nodes with the highest eigenvector centrality in each community
max_betw_cent_node = []
for part in part_sub:
    G_part = nx.Graph()
    for edge in edge_list:
        if edge[0] in part and edge[1] in part:
            G_part.add_weighted_edges_from([edge])
    max_betw_cent_node.append(max(G_part.nodes(), key=lambda val:
                                 nx.eigenvector_centrality_numpy(G_part, weight="weight")[val]))
    
print([dic[i] for i in max_betw_cent_node])
#Modularity indicator
print(community.modularity(partition,G))

Clustering was performed to find the one with the maximum eigenvector centrality for each community. If you have a good division, you will get good results when you set each as target_word. Consider setting a threshold value in the part used for the edge weight so that the weight will be different.

__ Extension of how to catch rhyme __

from pykakasi import kakasi
import re

with open("./gennama.txt","r", encoding="utf-8") as f:
    data = f.read()
    
kakasi = kakasi()
kakasi.setMode('J', 'K')
kakasi.setMode('H', 'K')
conv = kakasi.getConverter()
text_data = conv.do(data)

#e i → ee,Get the converted text like o u → oo
def expansion(text_data):
    #Depending on the last letter, i,Resolve the extra u by adjusting the length
    text_data_len = len(text_data)
    #Dealing with good chairs and a series of "yes, u" like that rumor.
    text_data = text_data.replace("good", "I i").replace("U","U u")
    text_data = text_data.split("I")
    new_text_data = []
    kakasi.setMode('K', 'a')
    conv = kakasi.getConverter()
    for i in range(len(text_data):
        if len(text_data[i]) > 0:
            if ("e" in conv.do(text_data[i][-1])):
                new_text_data.append(text_data[i] + "e")
            else:
                new_text_data.append(text_data[i] + "i")
            
    text_data = "".join(new_text_data).split("C")
    new_text_data = []
    for i in range(len(text_data):
        if len(text_data[i]) > 0:
            if ("o" in conv.do(text_data[i][-1])):
                new_text_data.append(text_data[i] + "o")
            else:
                new_text_data.append(text_data[i] + "u")

    return "".join(new_text_data)[:text_data_len]

print(expansion(text_data))

First, I had a policy of converting the data to katakana, dividing it by "i, u", and changing the processing according to the vowel of the immediately preceding character, but I had a hard time. If the end of the data is "i, u" or otherwise, "iu" remains. I dealt with it by making the length the same as the argument data, but when I tried print," i "remained at the end. I didn't expect the continuous appearance of "Good, U". After all, when you try it, it doesn't go smoothly, and you often don't notice it unless you do it.

Plan from now on

I will score the matching part for each (katakana conversion data, data with only vowels left, extended data) and try to capture (consonant matching, vowel matching, sound matching). It was judged unnecessary to see the matching of long vowels, nasals, and sokuons. In other words, let's summarize what we have done so far. I think that N-gram and space division should be taken into consideration, and there is a problem in how to see the matching part. I would like to summarize the current best method, prepare some input data, and verify it.

Recommended Posts

I want to handle the rhyme part1
I want to handle the rhyme part3
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
I want to handle the rhyme part7 (BOW)
I want to handle the rhyme part6 (organize once)
I want to handle the rhyme part8 (finished once)
I want to automate ssh using the expect command! part2
I want to pin Spyder to the taskbar
I want to output to the console coolly
I want to customize the appearance of zabbix
I want to use the activation function Mish
I want to display the progress in Python!
I want to see the file name from DataLoader
I want to grep the execution result of strace
I want to scroll the Django shift table, but ...
I want to handle optimization with python and cplex
I want to solve Sudoku (Sudoku)
I want to inherit to the back with python dataclass
I want to fully understand the basics of Bokeh
I want to write in Python! (3) Utilize the mock
I tried to erase the negative part of Meros
I want to automate ssh using the expect command!
I want to publish the product at the lowest cost
I want to use the R dataset in python
I want to increase the security of ssh connections
[TensorFlow] I want to master the indexing for Ragged Tensor
I want to use the latest gcc without sudo privileges! !!
I want to initialize if the value is empty (python)
I want to save the photos sent by LINE to S3
maya Python I want to fix the baked animation again.
I want to move selenium for the time being [for mac]
I want to be able to analyze data with Python (Part 1)
I want to use only the normalization process of SudachiPy
I want to get the operation information of yahoo route
I want to change the Japanese flag to the Palau flag with Numpy
I want to be able to analyze data with Python (Part 4)
I want to calculate the allowable downtime from the operating rate
I want to be able to analyze data with Python (Part 2)
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
I want to make the Dictionary type in the List unique
I want to map the EDINET code and securities number
Keras I want to get the output of any layer !!
I want to align the significant figures in the Numpy array
I want to know the legend of the IT technology world
I want to create a Dockerfile for the time being.
I didn't want to write the AWS key in the program
I want to understand systemd roughly
I want to scrape images to learn
I want to do ○○ with Pandas
I want to copy yolo annotations
I want to debug with Python
I tried to move the ball
I tried to estimate the interval.
I want to get the name of the function / method being executed
I want to record the execution time and keep a log.
I want to manually assign the training parameters of the [Pytorch] model
I want to automatically find high-quality parts from the videos I shot
I want to know the weather with LINE bot feat.Heroku + Python