The background of the characters in the text image is overexposed to make it easier to read.

I often take pictures of documents with my smartphone and use them. It's easy, but it's less clear than what you've scanned. To make it easier to read images with low contrast, such as those taken with a smartphone, it is best to increase the brightness of the background while leaving the characters black. Due to the process of whitening the background, it became necessary to distinguish between the character part of the image and the plain white background part, but it works well if the pixel statistics for each local part of the image are taken and the standard deviation of the pixel value is used for judgment. It was.

As an example, let's process the following image. You should be able to enlarge each image by clicking on it. tarama36p.jpg Naoki Hatano, "Tara Majima Visionary Line" p.36

Histogram flattening

Histogram flattening is often performed when sharpening an image. When the brightness of the pixels of the image is within a narrow range, if you expand it to the entire range of the image format, or 0 to 255 for grayscale images, the difference between the pixels will increase and the image will become clearer. OpenCV has its own function, which is explained in detail at the link below. [Histogram in OpenCV> Histogram Part 2: Histogram flattening](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization .html) Using this, the original image was grayscaled and then the histogram was flattened in the following image. Shown with the program.

 bookimg = cv2.imread('tarama36p.jpg')
 img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
 
 equ = cv2.equalizeHist(img_gray)
 cv2.imwrite('tarama36pcv2.jpg', equ )

tarama36pcv2.jpg The result does not feel that the characters are particularly clear. I didn't know from the original image, but it seems that the right page is brighter than the left. The reflection of the upper metal piece is emphasized. Actually, the histogram of this image is as follows, and the red line shows the maximum and minimum values of pixel brightness, but since the maximum and minimum values have already expanded to the full image range, a simple histogram flattening The effect of is small. tarama36p_hist_maxmin.png OpenCV also has a function called Applicable Histogram Flattening, which divides the image into smaller blocks and flattens the histogram for each block. The processing result by this is the following image.

 bookimg = cv2.imread('tarama36p.jpg')
 img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
 
 equ = cv2.equalizeHist(img_gray)
 cv2.imwrite('tarama36pcv2.jpg', equ )

tarama36pcv2cl1.jpg It's easier to see than cv2.equalizeHist (), but not as much as a scanner.

Make all white pixels pure white

The general-purpose contrast enhancement process tries to leave some difference in the pixel value of the white part. In the case of a text image, detailed information on a white background is not required, so all pixels above a certain threshold may be completely white, and all pixel values may be rewritten to 255. Since the black side contains the character shape information, the original pixel value is multiplied by a value smaller than 1, and the value is moved toward the black side, but the original tendency is retained. The threshold value was assumed to be 140 from the value per the median of the histogram. The program and processing results are as follows.

    for y in range(img_gray.shape[0]):
         for x in range(img_gray.shape[1]):
             if img_gray[y][x] > 140:
                 img_gray[y][x] = 255
             else:
                 img_gray[y][x] = img_gray[y][x] * 0.5

    cv2.imwrite('tarama36p140.jpg', img_gray )

tarama36p140.jpg On the right side, the background was white and the characters were clear as intended. The extra reflection of the metal pieces has also disappeared. However, the lower left was dark as a whole, so even the background part was recognized as black and emphasized. However, if you set the threshold to the lower left, the character part will fly white on the right side this time. That is, the appropriate threshold depends on the location of the image.

Process the image in blocks

Similar to the applicable histogram flattening, the original image is divided into blocks of 64 dots vertically and horizontally, and appropriate threshold values are obtained and processed for each. How to determine the threshold is important, but here the threshold value is set so that the median of the lower half of the median of the pixels of each block, that is, 1/4 of the entire block pixels is regarded as black. If you write a function that returns the threshold value with the block image img as an argument in Python, it will be as follows. It's a rough idea to use, but it seems to return a reasonably reasonable value. However, white text in black will not work.

 import numpy as np
 def getBWThrsh(img):
     med = np.median(img)
     fild = img[img < med]
     return np.median(fild) 

The processed result is the following image. In addition, the histogram averaging process is also performed for each block, and instead of simply substituting 255 for the white areas, substitute by multiplying the original pixel by a coefficient such that the majority of the background exceeds 256. There is. Most of them are skipped white, but some darker characters are left. tarama36p_s0.00b64.jpg The background of the characters is nicely white, but the characters on the back page are faintly transparent in the white background where there are no characters after line breaks. The following image is enlarged. a3bcf44c70f2e1a919809ebccd290c54-png.png There is only a very slight difference in shades in the part where the back page is transparent, but by performing the histogram averaging process, the back letters have emerged brilliantly.

Make a distinction between letters and white background

Since the back page is transparent, I want to avoid histogram averaging on the white background. How can I distinguish between the text part and the white background part? So far, we have performed statistical processing with numpy for each block to find the median value of pixels, but I came up with the idea that the standard deviation of pixel values can be used to distinguish between characters and white backgrounds. The white background has less variation in pixel values and the standard deviation is small, and the character part will have a larger value. Anyway, I calculated the standard deviation of the pixels of each block, created a histogram to see what kind of value is large, and examined the tendency. tarama36p_stdhist.png There is a peak on the small value on the left side, but this is probably a block on a white background. A white background can be determined by setting a threshold of standard deviation to a value that includes this peak. If the threshold value is set too small, dust on a white background will remain, and if it is set large, even if the white background contains some small characters, it will be regarded as a white background and will be chipped, so it is actually difficult to set an appropriate threshold value. Anyway, the image below shows the process of distinguishing between characters and white background and making the white background completely white. tarama36p_s6.00b64.jpg There was a lot of garbage left in places other than the surrounding text, but I think the text part could be processed neatly. Since it is now possible to distinguish between white backgrounds and non-white backgrounds on a block-by-block basis, if the white background blocks are continuous, it will be the surrounding margins, if there are characters in the margins, it will be page numbers, etc. I feel that it can be applied to various judgments.

The above sources are as follows. If you write the file name you want to convert in the argument of sharpenImg () on the bottom line, a file with a white background will be created. At the moment, it takes several tens of seconds to convert, but I think that if you rewrite it with C etc., it will be a practical processing speed.

import cv2
from matplotlib import pyplot as plt
import numpy as np

def getStdThrsh(img, Blocksize):
    stds = []
    for y in range( 0, img.shape[0], Blocksize ):
        for x in range( 0, img.shape[0], Blocksize ):
            pimg = img[y:y+Blocksize, x:x+Blocksize]
            std = np.std( pimg )
            minv = np.min( pimg )
            maxv = np.max( pimg )
            stds.append(std)

    hist = np.histogram( stds, bins=64 )
    peaki = np.argmax(hist[0])   

    #plt.hist( stds, bins=64 )
    #plt.show()

    slim = 6.0
    for n in range(peaki,len(hist[0])-1):
        if hist[0][n] < hist[0][n+1]:
            slim = hist[1][n+1]
            break

    if slim > 6.0:
        slim = 6.0
    
    return slim

def getBWThrsh(img):
    med = np.median(img)
    fild = img[img < med]
    return np.median(fild)

def getWbias( img, bwthr ):
    wimg = img[ img > bwthr ]
    hist = np.histogram( wimg, bins=16 )
    agm = np.argmax(hist[0])
    return hist[1][agm]

def getOutputName( title, slim ):
    return title + "_s{:04.2f}.jpg ".format( slim )

def sharpenImg(imgfile):
    Testimagefile = imgfile
    TestimageTitle = Testimagefile.split('.')[0]
    Blocksize = 64
    Bbias = 0.2

    bookimg = cv2.imread( Testimagefile )
    img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
    outimage = img_gray.copy()

    slim = getStdThrsh(img_gray, Blocksize)
    for y in range( 0, img_gray.shape[0], Blocksize ):
        s = ""
        for x in range( 0, img_gray.shape[1], Blocksize ):
            pimg = img_gray[y:y+Blocksize, x:x+Blocksize]
            std = np.std( pimg )
            minv = np.min( pimg )
            maxv = np.max( pimg )
            pimg -= minv

            cimg = pimg.copy()
            if maxv != minv:
                for sy in range (cimg.shape[0]):
                    for sx in range( cimg.shape[1] ):
                        cimg[sy][sx] = (cimg[sy][sx]*255.0)/(maxv - minv)

            bwthrsh = getBWThrsh( pimg )
            wb = getWbias( cimg, bwthrsh )
            if wb == 0:
                wbias = 1.5
            else:
                wbias = 256 / wb
            
            if std < slim:
                s = s + "B"
                for sy in range (pimg.shape[0]):
                    for sx in range( pimg.shape[1] ):
                        outimage[y+sy][x+sx] = 255
            else:
                s = s + "_"
                for sy in range (cimg.shape[0]):
                    for sx in range( cimg.shape[1] ):
                        if cimg[sy][sx] > bwthrsh:
                            v = cimg[sy][sx]
                            v = v * wbias
                            if v > 255:
                                v = 255
                            outimage[y+sy][x+sx] = v
                        else:
                            outimage[y+sy][x+sx] = cimg[sy][sx] * Bbias
        print( "{:4d} {:s}".format( y, s ) )

    cv2.imwrite(getOutputName(TestimageTitle, slim), outimage )

if __name__ =='__main__':
    sharpenImg('tarama36p.jpg')

https://github.com/pie-xx/TextImageViewer

Recommended Posts

The background of the characters in the text image is overexposed to make it easier to read.
Use Pillow to make the image transparent and overlay only part of it
Expand devicetree source include to make it easier to read
Python OpenCV tried to display the image in text.
I tried to extract the text in the image file using Tesseract of the OCR engine
Make the display of Python module exceptions easier to understand
You who color the log to make it easier to see
Count the number of characters in the text on the clipboard on mac
To make sure that the specified key is in the specified bucket in Boto 3
How to implement Java code in the background of RedHat (LinuxONE)
What to do when a part of the background image becomes transparent when the transparent image is combined with Pillow
[Image recognition] How to read the result of automatic annotation with VoTT
[TensorFlow 2] It is recommended to read features from TFRecord in batch units.
Tips to make Python here-documents easier to read
How to save the feature point information of an image in a file and use it for matching
An engineer who has noticed the emo of cryptography is trying to implement it in Python and defeat it
Various ways to read the last line of a csv file in Python
In Python, change the behavior of the method depending on how it is called
How to display in the entire window when setting the background image with tkinter
Hackathon's experience that it is most important to understand the feelings of the organizer
How to make the font width of jupyter notebook put in pyenv equal width
Deep learning dramatically makes it easier to see the time-lapse of physical changes
Make it easy to specify the time of AWS CloudWatch Events with CDK.
What is wheezy in the Docker Python image?
Make a copy of the list in Python
I tried to correct the keystone of the image
Read the output of subprocess.Popen in real time
I can't enter characters in the text area! ?? !! ?? !! !! ??
Make progress of dd visible in the progress bar
I want to make the LED Lighting of ErgoDox EZ shine, but tell me what the LED is in the first place
[Solution] When inserting "0001" into the column of string in sqlite3, it is entered as "1".
How to identify the element with the smallest number of characters in a Python list?
[Python] Change the text color and background color of a specific keyword in print output
What to do if the progress bar is not displayed in tqdm of python
Find out the maximum number of characters in multi-line text stored in a data frame
How to check in Python if one of the elements of a list is in another list
I tried to make it easy to change the setting of authenticated Proxy on Jupyter
It is easy to execute SQL with Python and output the result in Excel
It is difficult to install a green screen, so I cut out only the face and superimposed it on the background image
A simple cache of values in the property decorator. Read only. Note that it keeps caching until the object is deleted.
Tool to make mask image for ETC in Python
Template of python script to read the contents of the file
How to get the number of digits in Python
Read the csv file and display it in the browser
Convert the image in .zip to PDF with Python
Check if it is Unix in the scripting language
How to eliminate garbled characters in matplotlib output image
Deep nesting in Python makes it hard to read
How to use Decorator in Django and how to make it
Check if it is Unix in the scripting language
To do the equivalent of Ruby's ObjectSpace._id2ref in Python
Deploy the management page to production to make maintenance easier.
Is it a problem to eliminate the need for analog human resources in the AI era?
The sound of tic disorder at work is ... I managed to do it with the code
Make a note of what you want to do in the future with Raspberry Pi
[DanceDanceRevolution] Is it possible to predict the difficulty level (foot) from the value of the groove radar?
Ventilation is important. What I did to keep track of the C02 concentration in the room
What to do if the image is not displayed using matplotlib etc. in the Docker container
How to quickly count the frequency of appearance of characters from a character string in Python?
Return the image data with Flask of Python and draw it to the canvas element of HTML
The image is displayed in the local development environment, but the image is not displayed on the remote server of VPS