Introduction

sample

`result`



"Obake" can be sung by "you"[Degree of similarity:0.24651645]

Main subject

External API used to create the tool

Source code

https://github.com/kE-sakai/rhyming-app-v1

What I made

-① Word search tool that can step on the rhyme --API used -COTOHA Similarity Judgment API -② Search word pool generation tool --API used -COTOHA Parsing API -Qiita Post Article Acquisition API

① Word search tool that can step on the rhyme

role

--A tool that extracts specified words and words that can be rhetoric from CSV files --Similarity judgment between words using COTOHA API as an option

Figure

How it works

--Convert the specified word and the word in the CSV file to Romaji using pykakashi

`converter`


    def convert_hiragana_to_roma(self, target_word_hiragana):
        #In the case of sokuon
        #Since "tsu" and "tsu" are converted to the same "tsu", use "x" as a special character.
        if target_word_hiragana == "Tsu":
            return "x"
        else:
            kakasi_lib = kakasi()
            #Hiragana in romaji
            kakasi_lib.setMode('H', 'a')
            conv = kakasi_lib.getConverter()

            target_word_roma = conv.do(target_word_hiragana)
            return target_word_roma

--Extract vowel patterns from romaji and compare if they are the same pattern

conditions	Original word	Before conversion	After conversion
Vowels only	Obake	obake	oae
Contains a sokuon	full	ippai	ixai
"N" is included	pike	sanma	ana
"-" Is included	Thunder	sanda-	anaa

`converter`


    #Convert reading kana to phonological patterns
    def convert_roma_to_phoneme_pattern(self, target_char_roma_list):
        pre_phoneme = None
        hit_list = []
        for target_char_roma in target_char_roma_list:
            #Vowel case
            #Any of "Ah, uh, eh, oh"
            vowel_char = self.__find_vowel_char(
                target_char_roma
            )
            specific_char = self.__find_specific_char(
                pre_phoneme,
                target_char_roma
            )

            if vowel_char:
                hit_list.append(vowel_char)
                pre_phoneme = vowel_char
            elif specific_char:
                #Not a vowel, but a target case
                #"Tsu"
                #"Hmm"
                #"-"
                hit_list.append(specific_char)
                pre_phoneme = specific_char
            else:
                continue

        phoneme_pattern = "".join(hit_list)
        return phoneme_pattern

    def __find_vowel_char(self, char_roma):
        #For vowels
        vowel_list = ["a", "i", "u", "e", "o"]
        for vowel in vowel_list:
            if char_roma.find(vowel) > -1:
                return vowel
            else:
                continue
        #If not a vowel
        return None

    def __find_specific_char(self, pre_phoneme, char_roma):
        #In the case of "n"
        #In the case of "tsu":
        if char_roma == "n" or char_roma == "x":
            return char_roma
        #In the case of "-"
        #Consider the same as the previous vowel
        #Example)Daa-> a
        elif pre_phoneme != None and char_roma == "-":
            return pre_phoneme
        else:
            return None

Execution example

`execute`


$cd src
$python main.py ghost

`result`


"Obake" can be rhymed with "answer"
"Obake" can linger with "you"

Similarity judgment

--After extracting the combination of words that can be rhymed, set the specified word to base_word and the word extracted from CSV to pool_word for analysis.

`cotoha_client.py`


    def check_score(self, base_word, pool_word, access_token):
        headers = {
            "Content-Type": COTOHA_CONTENT_TYPE,
            "charset": COTOHA_CHAR_SET,
            "Authorization": "Bearer {}".format(access_token)
        }
        data = {
            "s1": base_word,
            "s2": pool_word,
            "type": "default"
        }
        req = urllib.request.Request(
            f"{COTOHA_BASE_URL}/{COTOHA_SIMILARITY_API_NAME}",
            json.dumps(data).encode(),
            headers
        )
        time.sleep(COTOHA_REQUEST_SLEEP_TIME)
        with urllib.request.urlopen(req) as res:
            body = res.read()
            return json.loads(body.decode())["result"]["score"]

Execution example

`execute`


$cd src
$python main.py ghost

`result`


"Obake" can be sung with "answer"[Degree of similarity:0.063530244]
"Obake" can be sung by "you"[Degree of similarity:0.24651645]

Task

The original CSV file is fixed

--Originally, during development, I used the noun list attached to mecab as a word pool. ――I thought it would be more interesting if I could create a mechanism to increase the number and types of vocabulary, so I came up with a tool to generate a word pool.

② Word pool generation tool

role

-(1) A tool that generates CSV for word search used in word search tools that can be used in rhymes.

Schematic

How it works

--Get the title of the posted article obtained by Qiita's posted article acquisition API

`qiita_client.py`


    def list_articles(self):
        req = urllib.request.Request(
            f"{QIITA_BASE_URL}/{QIITA_API_NAME}?page={QIITA_PAGE_NUMBERS}&per_page={QIITA_ITEMS_PAR_PAGE}"
        )
        with urllib.request.urlopen(req) as res:
            body = res.read()
            return json.loads(body.decode())

--Classify the acquired titles into part of speech by applying COTOHA's parsing API

`cotoha_client.py`


    # target_Put Qiita article title in sentence
    def parse(self, target_sentence, access_token):
        headers = {
            "Content-Type": COTOHA_CONTENT_TYPE,
            "charset": COTOHA_CHAR_SET,
            "Authorization": "Bearer {}".format(access_token)
        }
        data = {
            "sentence": target_sentence,
        }
        req = urllib.request.Request(
            f"{COTOHA_BASE_URL}/{COTOHA_PARSE_API_NAME}",
            json.dumps(data).encode(),
            headers
        )
        time.sleep(COTOHA_REQUEST_SLEEP_TIME)
        with urllib.request.urlopen(req) as res:
            body = res.read()
            return json.loads(body.decode())["result"]

--Extract only nouns from part of speech and output to CSV file

`finder.py`


    #Extracts only nouns from parsing results and returns a list of them
    def find_noun(self, target_sentence_element):
        noun_list = []
        for element_num in range(len(target_sentence_element)):
            tokens = target_sentence_element[element_num]["tokens"]

            for tokens_num in range(len(tokens)):
                target_form = tokens[tokens_num]["form"]
                target_kana = tokens[tokens_num]["kana"]
                target_pos = tokens[tokens_num]["pos"]

                #If it is a noun, store it in the list
                if target_pos == TARGET_CLASS:
                    #English, numbers, and symbolic words store reading kana instead
                    # TODO:There is room for improvement in the judgment.
                    if re.match(FINDER_REGEX, target_form):
                        noun_list.append(target_kana)
                    else:
                        noun_list.append(target_form)

        return noun_list

Execution example

`execute`


$cd tool
$python word_pool_generator.py

`word_pool.csv`


backup
tool
ABC
string
visual
studio
code
Note
management
Expansion
Summary
paper
Commentary

Task

――Honestly, it's heavy. Even 40 posted articles will take about 5 minutes to process. ――The number of nouns that can be extracted in one article title is about 2 to 5. ――However, since I was touching pandas for the first time when outputting to a CSV file, I think that I can improve the logic further. ――It's a level that I made something that works for the time being.

--Improved judgment of English words --In the current logic, "Raspberry Pi" will be "Raspberry Pi" instead of "Raspberry Pi". ――For example, if you can pass only "Raspberry" to the parsing API and judge it as "Raspberry", you can make it feel a little better by devising the way of passing words. ――By the way, "Google" was "Google".

――Be able to increase the variation of words ――It seems that you can collect words from other fields by scraping on other sites.

Articles that I used as a reference

https://note.com/junkawashima/n/n146874f827bc
https://qiita.com/komorin0521/items/8cd1eb0cdb4a9ede217e

Other

About the motive that made

――The reason why I made this kind of thing in the first place is that I saw this article about half a year ago and said with a friend, "I can use natural language processing to rhyme. I had a conversation, "Can I find a word?" ――However, I didn't know the field of natural language processing at all at that time (even now), and when I happened to see this project, I thought that I could make something close to it, so I decided to create this tool. I made it.