I often take pictures of documents with my smartphone and use them. It's easy, but it's less clear than what you've scanned. To make it easier to read images with low contrast, such as those taken with a smartphone, it is best to increase the brightness of the background while leaving the characters black. Due to the process of whitening the background, it became necessary to distinguish between the character part of the image and the plain white background part, but it works well if the pixel statistics for each local part of the image are taken and the standard deviation of the pixel value is used for judgment. It was.
As an example, let's process the following image. You should be able to enlarge each image by clicking on it. Naoki Hatano, "Tara Majima Visionary Line" p.36
Histogram flattening is often performed when sharpening an image. When the brightness of the pixels of the image is within a narrow range, if you expand it to the entire range of the image format, or 0 to 255 for grayscale images, the difference between the pixels will increase and the image will become clearer. OpenCV has its own function, which is explained in detail at the link below. [Histogram in OpenCV> Histogram Part 2: Histogram flattening](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization .html) Using this, the original image was grayscaled and then the histogram was flattened in the following image. Shown with the program.
bookimg = cv2.imread('tarama36p.jpg')
img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
equ = cv2.equalizeHist(img_gray)
cv2.imwrite('tarama36pcv2.jpg', equ )
The result does not feel that the characters are particularly clear. I didn't know from the original image, but it seems that the right page is brighter than the left. The reflection of the upper metal piece is emphasized. Actually, the histogram of this image is as follows, and the red line shows the maximum and minimum values of pixel brightness, but since the maximum and minimum values have already expanded to the full image range, a simple histogram flattening The effect of is small. OpenCV also has a function called Applicable Histogram Flattening, which divides the image into smaller blocks and flattens the histogram for each block. The processing result by this is the following image.
bookimg = cv2.imread('tarama36p.jpg')
img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
equ = cv2.equalizeHist(img_gray)
cv2.imwrite('tarama36pcv2.jpg', equ )
It's easier to see than cv2.equalizeHist (), but not as much as a scanner.
The general-purpose contrast enhancement process tries to leave some difference in the pixel value of the white part. In the case of a text image, detailed information on a white background is not required, so all pixels above a certain threshold may be completely white, and all pixel values may be rewritten to 255. Since the black side contains the character shape information, the original pixel value is multiplied by a value smaller than 1, and the value is moved toward the black side, but the original tendency is retained. The threshold value was assumed to be 140 from the value per the median of the histogram. The program and processing results are as follows.
for y in range(img_gray.shape[0]):
for x in range(img_gray.shape[1]):
if img_gray[y][x] > 140:
img_gray[y][x] = 255
else:
img_gray[y][x] = img_gray[y][x] * 0.5
cv2.imwrite('tarama36p140.jpg', img_gray )
On the right side, the background was white and the characters were clear as intended. The extra reflection of the metal pieces has also disappeared. However, the lower left was dark as a whole, so even the background part was recognized as black and emphasized. However, if you set the threshold to the lower left, the character part will fly white on the right side this time. That is, the appropriate threshold depends on the location of the image.
Similar to the applicable histogram flattening, the original image is divided into blocks of 64 dots vertically and horizontally, and appropriate threshold values are obtained and processed for each. How to determine the threshold is important, but here the threshold value is set so that the median of the lower half of the median of the pixels of each block, that is, 1/4 of the entire block pixels is regarded as black. If you write a function that returns the threshold value with the block image img as an argument in Python, it will be as follows. It's a rough idea to use, but it seems to return a reasonably reasonable value. However, white text in black will not work.
import numpy as np
def getBWThrsh(img):
med = np.median(img)
fild = img[img < med]
return np.median(fild)
The processed result is the following image. In addition, the histogram averaging process is also performed for each block, and instead of simply substituting 255 for the white areas, substitute by multiplying the original pixel by a coefficient such that the majority of the background exceeds 256. There is. Most of them are skipped white, but some darker characters are left. The background of the characters is nicely white, but the characters on the back page are faintly transparent in the white background where there are no characters after line breaks. The following image is enlarged. There is only a very slight difference in shades in the part where the back page is transparent, but by performing the histogram averaging process, the back letters have emerged brilliantly.
Since the back page is transparent, I want to avoid histogram averaging on the white background. How can I distinguish between the text part and the white background part? So far, we have performed statistical processing with numpy for each block to find the median value of pixels, but I came up with the idea that the standard deviation of pixel values can be used to distinguish between characters and white backgrounds. The white background has less variation in pixel values and the standard deviation is small, and the character part will have a larger value. Anyway, I calculated the standard deviation of the pixels of each block, created a histogram to see what kind of value is large, and examined the tendency. There is a peak on the small value on the left side, but this is probably a block on a white background. A white background can be determined by setting a threshold of standard deviation to a value that includes this peak. If the threshold value is set too small, dust on a white background will remain, and if it is set large, even if the white background contains some small characters, it will be regarded as a white background and will be chipped, so it is actually difficult to set an appropriate threshold value. Anyway, the image below shows the process of distinguishing between characters and white background and making the white background completely white. There was a lot of garbage left in places other than the surrounding text, but I think the text part could be processed neatly. Since it is now possible to distinguish between white backgrounds and non-white backgrounds on a block-by-block basis, if the white background blocks are continuous, it will be the surrounding margins, if there are characters in the margins, it will be page numbers, etc. I feel that it can be applied to various judgments.
The above sources are as follows. If you write the file name you want to convert in the argument of sharpenImg () on the bottom line, a file with a white background will be created. At the moment, it takes several tens of seconds to convert, but I think that if you rewrite it with C etc., it will be a practical processing speed.
import cv2
from matplotlib import pyplot as plt
import numpy as np
def getStdThrsh(img, Blocksize):
stds = []
for y in range( 0, img.shape[0], Blocksize ):
for x in range( 0, img.shape[0], Blocksize ):
pimg = img[y:y+Blocksize, x:x+Blocksize]
std = np.std( pimg )
minv = np.min( pimg )
maxv = np.max( pimg )
stds.append(std)
hist = np.histogram( stds, bins=64 )
peaki = np.argmax(hist[0])
#plt.hist( stds, bins=64 )
#plt.show()
slim = 6.0
for n in range(peaki,len(hist[0])-1):
if hist[0][n] < hist[0][n+1]:
slim = hist[1][n+1]
break
if slim > 6.0:
slim = 6.0
return slim
def getBWThrsh(img):
med = np.median(img)
fild = img[img < med]
return np.median(fild)
def getWbias( img, bwthr ):
wimg = img[ img > bwthr ]
hist = np.histogram( wimg, bins=16 )
agm = np.argmax(hist[0])
return hist[1][agm]
def getOutputName( title, slim ):
return title + "_s{:04.2f}.jpg ".format( slim )
def sharpenImg(imgfile):
Testimagefile = imgfile
TestimageTitle = Testimagefile.split('.')[0]
Blocksize = 64
Bbias = 0.2
bookimg = cv2.imread( Testimagefile )
img_gray = cv2.cvtColor(bookimg, cv2.COLOR_BGR2GRAY)
outimage = img_gray.copy()
slim = getStdThrsh(img_gray, Blocksize)
for y in range( 0, img_gray.shape[0], Blocksize ):
s = ""
for x in range( 0, img_gray.shape[1], Blocksize ):
pimg = img_gray[y:y+Blocksize, x:x+Blocksize]
std = np.std( pimg )
minv = np.min( pimg )
maxv = np.max( pimg )
pimg -= minv
cimg = pimg.copy()
if maxv != minv:
for sy in range (cimg.shape[0]):
for sx in range( cimg.shape[1] ):
cimg[sy][sx] = (cimg[sy][sx]*255.0)/(maxv - minv)
bwthrsh = getBWThrsh( pimg )
wb = getWbias( cimg, bwthrsh )
if wb == 0:
wbias = 1.5
else:
wbias = 256 / wb
if std < slim:
s = s + "B"
for sy in range (pimg.shape[0]):
for sx in range( pimg.shape[1] ):
outimage[y+sy][x+sx] = 255
else:
s = s + "_"
for sy in range (cimg.shape[0]):
for sx in range( cimg.shape[1] ):
if cimg[sy][sx] > bwthrsh:
v = cimg[sy][sx]
v = v * wbias
if v > 255:
v = 255
outimage[y+sy][x+sx] = v
else:
outimage[y+sy][x+sx] = cimg[sy][sx] * Bbias
print( "{:4d} {:s}".format( y, s ) )
cv2.imwrite(getOutputName(TestimageTitle, slim), outimage )
if __name__ =='__main__':
sharpenImg('tarama36p.jpg')
https://github.com/pie-xx/TextImageViewer
Recommended Posts