For some reason, I decided to try OCR in multiple languages, but I didn't have a dataset that I could use freely, so I had to create my own, so I created a script.
Use Pillow (PIL), Python's image processing library. http://pillow.readthedocs.org/en/3.0.x/index.html
Generates one image for each character.
The code of the generated part body is as follows.
from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
def generate_char_img(char, fontname='Osaka', size=(64, 64)):
img=Image.new('L', size, 'white')
draw = ImageDraw.Draw(img)
fontsize = int(size[0]*0.8)
font = ImageFont.truetype(fontname, fontsize)
# adjust charactor position.
char_displaysize = font.getsize(char)
offset = tuple((si-sc)//2 for si, sc in zip(size, char_displaysize))
assert all(o>=0 for o in offset)
# adjust offset, half value is right size for height axis.
draw.text((offset[0], offset[1]//2), char, font=font, fill='#000')
return img
def save_img(img, filepath):
img.save(filepath, 'png')
I put the whole executable code in gist. https://gist.github.com/lazykyama/dabe526246d60fa937d1 ** (2015/10/18 23:47 Addendum: It seems that the specification of ʻImage.save ()` or the file name of uppercase and lowercase letters is not distinguished, so please be careful.) **
To generate a character list for each language, do the following.
eng_char_list = list(string.digits+string.ascii_letters)
(Reference of string module → http://docs.python.jp/3.3/library/string.html)
Let's do our best and pull out the characters from Wikipedia.
(゜ ⊿ ゜) Silane
* .ttf
file.
――As soon as I got hooked on it because of my weaknessRecommended Posts