Generate many single-character images with Pillow (PIL)

motivation

For some reason, I decided to try OCR in multiple languages, but I didn't have a dataset that I could use freely, so I had to create my own, so I created a script.

What I used

Use Pillow (PIL), Python's image processing library. http://pillow.readthedocs.org/en/3.0.x/index.html

Image to generate

Generates one image for each character.

script

The code of the generated part body is as follows.

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont

def generate_char_img(char, fontname='Osaka', size=(64, 64)):
    img=Image.new('L', size, 'white')
    draw = ImageDraw.Draw(img)
    fontsize = int(size[0]*0.8)
    font = ImageFont.truetype(fontname, fontsize)

    # adjust charactor position.
    char_displaysize = font.getsize(char)
    offset = tuple((si-sc)//2 for si, sc in zip(size, char_displaysize))
    assert all(o>=0 for o in offset)

    # adjust offset, half value is right size for height axis.
    draw.text((offset[0], offset[1]//2), char, font=font, fill='#000')
    return img

def save_img(img, filepath):
    img.save(filepath, 'png')

I put the whole executable code in gist. https://gist.github.com/lazykyama/dabe526246d60fa937d1 ** (2015/10/18 23:47 Addendum: It seems that the specification of ʻImage.save ()` or the file name of uppercase and lowercase letters is not distinguished, so please be careful.) **

To generate a character list for each language, do the following.

English (alphabet case + number)

eng_char_list = list(string.digits+string.ascii_letters)

(Reference of string module → http://docs.python.jp/3.3/library/string.html)

Japanese

Let's do our best and pull out the characters from Wikipedia.

Other languages

(゜ ⊿ ゜) Silane

Caution

reference

Recommended Posts

Generate many single-character images with Pillow (PIL)
Use PIL and Pillow with Cygwin Python
Fastly replace image colors with PIL / Pillow
PIL / Pillow cheat sheet
Center images with python-pptx
Image Processing with PIL
Convert color space from RGB to CIELAB with PIL (Pillow)
Convert images to sepia toning with PIL (Python Imaging Library)
Convert garbled scanned images to PDF with Pillow and PyPDF
The story of displaying images with OpenCV or PIL (only)