-** I want to read the license information from the photo! ** ** ――I want to enjoy image processing!
--The outline of a card (nanaco card) of the same size as the driver's license is detected by ** OpenCV **, and ** projective conversion ** is performed to make the contents of the card easier to read **. --Ready to read content with OCR (reading content will be introduced in the next article) ――Since I don't do OCR, I will substitute the same size nanaco card this time.
-** From diagonally above ** The card I took ... → You can now ** correct the angle and display the card ** like this →
--** [Dynamic determination of binarization threshold](#Determining binarization threshold) ** logic for card detection -(It looks like the accuracy is just a little better) ――I haven't verified it properly, so it's a feeling level.
-** People who want to detect contours (edge detection) with OpenCV ** --People who want to read card information from photos
I use Pipenv.
Pipenv
brew install pipenv
pipenv install numpy matplotlib opencv-contrib-python pyocr
pipenv install jupyterlab --dev
is ʻopencv-python
. [^ opencv]Three images with the following configuration
--nanaco.jpeg
(taken in the easiest way)
--nanaco_skew.jpeg
(taken so that the shape of the card is distorted from an angle)
--nanaco_in_hand.jpeg
(taken with your hand on a white background)
I will try.
The source code uses jupyter notebook card.ipynb
.
.
├── Pipfile
├── Pipfile.lock
├── images
│ ├── nanaco.jpeg
│ ├── nanaco_in_hand.jpeg
│ └── nanaco_skew.jpeg
└── notebooks
└── card.ipynb
pipenv run jupyter lab
notebooks / card.ipynb
and execute the following in the cell (execute all other scripts in the cell)%matplotlib inline
import cv2
import matplotlib.pyplot as plt
img = cv2.imread('../images/nanaco_skew.jpeg')
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
I'm curious about the scale of matplotlib
, but I don't care too much, but rather the coordinates are easy to understand, so I will proceed as it is this time.
#Grayscale
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gray_img)
plt.gray()
In many tutorials and articles, the threshold value for binarization was hard-coded to a value of about 200
and was treated as if it was determined manually.
In this article, I've included ** logic that dynamically (automatically) determines the threshold **
import numpy as np
#nanaco is 0.About 2 looks good. If you have a driver's license, you may need to tune again
card_luminance_percentage = 0.2
# TODO:Performance outlook
def luminance_threshold(gray_img):
"""
Value in grayscale(Called brightness)But`x`The number of points above is 20%Calculate the maximum x that exceeds
However,`100 <= x <= 200`To
"""
number_threshold = gray_img.size * card_luminance_percentage
flat = gray_img.flatten()
# 200 -> 100
for diff_luminance in range(100):
if np.count_nonzero(flat > 200 - diff_luminance) >= number_threshold:
return 200 - diff_luminance
return 100
threshold = luminance_threshold(gray_img)
print(f'threshold: {threshold}')
The thresholds for the three types of images this time were calculated as follows.
For example, in nanaco_skew.jpeg
, it didn't work if the threshold was (commonly used) 200
, probably because of the amount of reflected light. By using 138
calculated from the above source code, you can get the outline of the card later.
nanaco.jpeg | nanaco_skew.jpeg | nanaco_in_hand.jpeg | |
---|---|---|---|
Binarization threshold | 200 | 138 | 199 |
image | |||
_, binarized = cv2.threshold(gray_img, threshold, 255, cv2.THRESH_BINARY)
plt.imshow(cv2.cvtColor(binarized, cv2.COLOR_BGR2RGB))
contours, _ = cv2.findContours(binarized, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#Select the one with the largest area
card_cnt = max(contours, key=cv2.contourArea)
#Draw contours on the image
line_color = (0, 255, 0)
thickness = 30
cv2.drawContours(img, [card_cnt], -1, line_color, thickness)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
Based on the contour information captured above, projective transformation (angle correction) will be performed.
#Approximate the contour with a convex shape
#A fixed value of 0 for the total length of the contour.Multiplying a factor of 1 is enough
#Coefficient tuning seems to be almost unnecessary on the premise that the card is copied properly to some extent.(May be necessary for OCR adjustment)
epsilon = 0.1 * cv2.arcLength(card_cnt, True)
approx = cv2.approxPolyDP(card_cnt, epsilon, True)
#Card width(Since the image has a vertical card, the width and height are reversed during projective transformation.)
card_img_width = 2400 #Appropriate value
card_img_height = round(card_img_width * (5.4 / 8.56)) #License ration(=nanaco ratio)Calculated by dividing by
src = np.float32(list(map(lambda x: x[0], approx)))
dst = np.float32([[0,0],[0,card_img_width],[card_img_height,card_img_width],[card_img_height,0]])
projectMatrix = cv2.getPerspectiveTransform(src, dst)
#Since the line was overwritten earlier, get the image again
img = cv2.imread('../images/nanaco_skew.jpeg')
transformed = cv2.warpPerspective(img, projectMatrix, (card_img_height, card_img_width))
plt.imshow(cv2.cvtColor(transformed, cv2.COLOR_BGR2RGB))
**did it! !! ** The slanted letters are now straight!
After this, I'm thinking of trying to read the contents using OCR using an actual driver's license, but I also tried a little with the current nanaco. However, although it is necessary to put restrictions on the part to be read, it is roughly done.
Using the image of nanaco_in_hand.jpeg
, I applied OCR to the last obtained image using pyocr
for the entire image.
You can get this by running the same script as above for nanaco_in_hand.jpeg
(slightly diagonal ...)
For this image, I tried to convert it to text using pyocr + tesseract
as per the tutorial. [^ ocr]
Within this plan for using the nanaco card
For details on the use of the chin card, please refer to the member agreement. Five
Ako's card is a member store with the nanaco mark on the right, and you can use the electronic money and the electronic manager in the card.
You will be able to confirm your balance.
Do not bend the card, give it a great impact, or leave it at high temperature or when it is magnetized.
Ako's card and the electronic money in the card are not cashable.
The upper limit of the charge for Ako's card is 50,000 yen.
Ako's card can only be used by the member who has approved the member agreement and has signed the member office name field.
The ownership of Ako's card belongs to the stock company Seven Card Service, and it is not possible to lend or transfer it to another person.
Is it a place that is sloppy in spite of doing it roughly? I will improve the accuracy of this area with a driver's license and continue. (It's quite interesting that "●" is recognized as "A")
[^ binarization]: RGB image with information of each point (256x256x256) → Converts the binary image of binary information of each point (2 = 1/0). This is a process to make contour extraction easier.
[^ opencv]: ʻopencv-contrib-python imports the
contrib module along with ʻopen-python
. ʻOpencv-python seems to be okay with
cv2`. However, as far as the official document is seen, it seems that it is better to use this notation for new items. (Reference: https://pypi.org/project/opencv-python/)
[^ ocr]: This article was also helpful! Installation such as tesseract is required.