Plans
Recognize individual characters first. (Using haar or YOLO?)
Cover each character with Rectangle and perform the following processing.
In the area surrounded by Rectangle, the brightness value is 3D-plotted, and this is 3D-modeled so that each character can be accurately recognized based on the map when rotated.
# Modules
from pathlib import Path
from skimage import io
import matplotlib.pyplot as plt
import cv2
import numpy as np
# Putting some image files of any documents under dataset
p = Path("../dataset")
paths = list(p.glob("**/*.jpg "))
data1 = io.imread(paths[0])
# Using tile strategy to evaluate the recognition accuracy.
mini = data1[1200:1400, 800:1000, 0]
plt.imshow(mini)
print("Showing data to be processed...")
plt.show()
Switch1
# Highlighting the target
## Lazy normalization
if mini.max() > 256:
subject = np.true_divide(mini, 256).astype("uint8")
else:
subject = mini.astype("uint8")
## Creating a mask to remove noise.
mask = (subject < 200)
masked = mask * subject
## Distance transform
distmap = cv2.distanceTransform(masked,1,3)
## Creating all zero matrix, which size equal to data
featuremap = distmap*0
## Deciding kernel size to convolve.
ksize = 20
## Detecting edge with convolution...
for x in range(ksize,distmap.shape[0]-ksize*2):
for y in range(ksize,distmap.shape[1]-ksize*2):
###The coordinates with the largest value in the Kernel are output as 1 in the feature map....
### max-Processing is close to pooling. Max-We are implementing pooling and standardization at the same time.
if distmap[x,y]>0 and distmap[x,y]==np.max(distmap[x-ksize:x+ksize,y-ksize:y+ksize]):
featuremap[x,y]=1
### defining feature_dilated for imshow
feature_dilated = cv2.dilate(featuremap, (50, 50))
print("Masked image is ... : ")
plt.imshow(masked)
plt.show()
plt.imshow(feature_dilated)
print("Feature shape is ... : " + str(feature_dilated.shape))
plt.show()
Feature_mask = (feature_dilated > 0)
Cropped = masked * Feature_mask
plt.imshow(Cropped)
plt.show()
Switch2
Let Public-OCR-server process the image file cut by Tile-strategy.
Let a server such as MNIST process the cut image file for single character recognition. (After performing One-hot vectorization, crop only one character from that coordinate.)