Do you know a card game called Topolo Memory? It is a game in which you compete for cards when you find a homeomorphic figure written on a card. When I played this game with a friend, I added a game to search for homeomorphic idioms, so I will try to analyze the solution of this with the help of opencv.
OS:macOSX Python: 3.6.8 opencv: 4.0.0.21 numpy: 1.15.0
Use the mecab dictionary as the dictionary. Search from all noun dictionaries (228297).
Open the csv files one by one and extract only the headwords. Convert to numpy.array and then slice. Connect all this.
csvnoun.py
import csv
import codecs
import numpy as np
from functools import reduce
csvs = [
"Noun.csv",
"Noun.adjv.csv",
"Noun.adverbal.csv",
"Noun.demonst.csv",
"Noun.nai.csv",
"Noun.name.csv",
"Noun.number.csv",
"Noun.org.csv",
"Noun.others.csv",
"Noun.place.csv",
"Noun.proper.csv",
"Noun.verbal.csv"
]
filedelimitor = "~/mecab-ipadic-2.7.0-20070801/"
def csv_1(csv_file):
with codecs.open(filedelimitor+csv_file, "r","euc_jp") as f:
reader = csv.reader(f)
csv_words = [k for k in reader]
csv_words_np = np.array(csv_words)
return(csv_words_np[:,0].tolist())
words = reduce(lambda x,y:x+y,[csv_1(k) for k in csvs])
print(words[0:10])
print("Quantity:",len(words))
It takes about 2 seconds in the local environment.
Since it is troublesome to output Japanese characters with opencv, it is generated with pillow.
char_img.py
import cv2
from PIL import Image, ImageDraw, ImageFont
import numpy as np
img = Image.new("L",(500, 500),"white")
char = "Ah"
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)
draw = ImageDraw.Draw(img)
draw.text((0,0),char,font=jpfont,fill="black")
img_cv = np.array(img,dtype=np.uint8)
If you want to run it on other than OSX, change jpfont to an appropriate font.
Since opencv has a function called cv2.findContours, use this. A function that detects the contour of a binary image. (Reference: http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html) When the flag (second argument) is set to RETR_TREE, the hierarchy retains the entire hierarchy. Use this hierarchy. hierarcy is stored in the structure [Next, Previous, First_Child, Parent]. Of these, only parent is used because the structure can be found by examining all parents. Searching using First_child or Next requires less calculation, but even complicated characters do not exceed 20 parts, so we are searching all. Since cv2.findContours has a background of 0, the one with 0 as child is the outermost line. Since the line part has an even parent, count the elements that have an even index as the parent. Finally, it converts the string character by character, concatenates it, and sorts and returns it for searching.
string_topology.py
topology_dic = {}
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)
def char_topology(char):
if char in topology_dic:
return topology_dic[char]
else:
img = Image.new("L",(500, 500),"white")
draw = ImageDraw.Draw(img)
draw.text((0,0),char,font=jpfont,fill="black")
img_cv = np.array(img,dtype=np.uint8)
ret,thresh = cv2.threshold(img_cv,127,255,0)
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
img_cv_con = np.zeros((500,500,3),np.uint8)
cv2.drawContours(img_cv_con,contours,-1,(0,255,0),3)
parent = [k[3] for k in hierarchy[0]]
topology = [parent.count(k)
for k in range(len(parent)) if parent[k]%2 == 0]
topology_dic[char] = topology
return topology
def string_topology(string):
topology = reduce(lambda x,y:x+y,[char_topology(k) for k in string])
topology.sort()
return topology
Converts the list string to topology and outputs the match.
search.py
in_topology = string_topology(sys.argv[1])
print(in_topology)
for k in words:
if in_topology == string_topology(k):
print(k)
If you connect them all, it will look like this.
same_topology.py
import cv2
import os
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from functools import reduce
import csv
import codecs
import sys
csvs = [
"Noun.csv",
"Noun.adjv.csv",
"Noun.adverbal.csv",
"Noun.demonst.csv",
"Noun.nai.csv",
"Noun.name.csv",
"Noun.number.csv",
"Noun.org.csv",
"Noun.others.csv",
"Noun.place.csv",
"Noun.proper.csv",
"Noun.verbal.csv"
]
filedelimitor = "~/mecab-ipadic-2.7.0-20070801/"
def csv_1(csv_file):
with codecs.open(filedelimitor+csv_file, "r","euc_jp") as f:
reader = csv.reader(f)
csv_words = [k for k in reader]
csv_words_np = np.array(csv_words)
return(csv_words_np[:,0].tolist())
words = reduce(lambda x,y:x+y,[csv_1(k) for k in csvs])
topology_dic = {}
jpfont = ImageFont.truetype("/System/Library/Fonts/Hiragino Horn Gothic W4.ttc",500)
def char_topology(char):
if char in topology_dic:
return topology_dic[char]
else:
img = Image.new("L",(500, 500),"white")
draw = ImageDraw.Draw(img)
draw.text((0,0),char,font=jpfont,fill="black")
img_cv = np.array(img,dtype=np.uint8)
ret,thresh = cv2.threshold(img_cv,127,255,0)
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
img_cv_con = np.zeros((500,500,3),np.uint8)
cv2.drawContours(img_cv_con,contours,-1,(0,255,0), 3)
parent = [k[3] for k in hierarchy[0]]
topology = [parent.count(k)
for k in range(len(parent)) if parent[k]%2 == 0]
topology_dic[char] = topology
return topology
def string_topology(string):
topology = reduce(lambda x,y:x+y,[char_topology(k) for k in string])
topology.sort()
return topology
in_topology = string_topology(sys.argv[1])
print(in_topology)
for k in words:
if in_topology == string_topology(k):
print(k)
Output result
$ python topology.py tokyo
[0, 0, 0, 1, 4]
Grilled skewers
Lotus stand
Shiritori
Gorenshi
Extraction
Wasan
Beautiful child
To reach
future affairs
Case
coition
Fire car
Fineness
Slush fund
clasp
huge projectile
...
A compound word that is homeomorphic to the first argument is output. In the hand environment, it will be output in about 30 seconds.
This time, we have not processed the tofu characters, but it should be done (I think that the ipa dictionary can output without becoming tofu ...). It was pointed out that "times" and "loro" are not homeomorphic, but they should be homeomorphic in terms of continuity. ~~ Maybe. ~~ (I think I lack knowledge of mathematics, so I would appreciate it if an expert would comment)
Recommended Posts