Determine the numbers in the image taken with the webcam

After doing an example using MNIST in the introduction to deep learning, there may be people who want to do something applied but can't think of a good example.

This time, I would like to help those people, so I will try to make something that distinguishes the numbers reflected on the webcam.

Project webcam video

First, let's display the video from the webcam. It seems easy to do with OpenCV.

This time, I'm using Logitech's "HD Webcam C270".

#!/usr/bin/python
#coding: utf-8

import cv2

def main():
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

If you are asked if you want your security software to allow access to your webcam at runtime, allow it.

Reference article: Venus ☆ Channel: Get webcam image with Python version OpenCV

Get and process images

Rather than constantly discriminating numbers, I want to get the image at that time and process it when I press any key. In the end, I want to pass it to the number discrimination process, but for the time being, I'll just display a message to check the operation.

Please refer to the reference article for which key is assigned to which number.

#!/usr/bin/python
#coding: utf-8

import cv2

def main():
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            print("Execute processing")
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Reference article: List of highgui key codes --Wasting Note

Preprocessing of acquired image

I thought it would be big to pass the entire acquired image as input, so I would like to cut out the 100x100 part in the center.

First, check the size of the image acquired by the webcam.

if k == 101:
    print(image.shape)

We will change the processing part of the E key. In OpenCV, the image is a Numpy array, so you can know the length of the element by using hogehoge.shape. In this example, (480, 640, 3) is output, so you can see that the size is 480 (vertical) x 640 (horizontal).

Now that we know the size, the process of cutting out the center 100 x 100 is as follows. Let's save the image and see if the crop is working.

if k == 101:
    img = image[190:290,270:370]
    cv2.imwrite("img.jpg ",img)

All you have to do now is match this cropped image to the same input format as for MNIST. Specifically, the processing is as follows.

  1. The webcam image is in color, so first make it grayscale.
  2. Reduce the image to 28x28
  3. Perform the same processing as for MNIST

Combine it with the cutout in the center part earlier and put it together in the preprocessing function.

import numpy as np

def preprocessing(img):
    #Cutout in the center
    img = img[190:290,270:370]
    #Conversion to grayscale
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    #Reduce image to 28x28
    img = cv2.resize(img, (28, 28))
    #Below, the same processing as during learning is performed.
    img = 255 - img
    img = img.astype(np.float32)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

MLP settings used to determine numbers

For this number reading, I will use the simple MLP that is also used in the introduction to chainer.

from chainer import Chain, serializers
import chainer.functions  as F
import chainer.links as L

#Multilayer perceptron model settings
class MyMLP(Chain):
    #Input 784, middle layer 500, output 10 dimensions
    def __init__(self, n_in=784, n_units=500, n_out=10):
        super(MyMLP, self).__init__(
            l1=L.Linear(n_in, n_units),
            l2=L.Linear(n_units, n_units),
            l3=L.Linear(n_units, n_out),
        )
    #Neural network structure
    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

Since we will determine the numbers multiple times, we will first load the trained model (my.model2 in this example).

  • Added on July 10, 2017 This time, I am using the result (my.model2) of training the MLP defined above using MNIST data. If the reader uses the trained data prepared by himself / herself, it will be smooth to rewrite the contents of class MyMLP to the same one as when it was used for learning.

Then, when the E key is pressed, let's add a process to determine the number using the trained model and display the result.

def main():
    #Loading trained model
    net = MyMLP()###Additional part###
    serializers.load_npz('my.model2', net)###Additional part###
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            img = preprocessing(image)
            num = net(img)###Additional part###
            print(num.data)###Additional part###
            print(np.argmax(num.data))###Additional part###
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()

You should now have everything you need.

Display of the part to be cut out

There is no problem in processing, but it is convenient to know where the cutout part is when operating the webcam. Therefore, the part to be cut out in the webcam image is shown in the red frame.

Webcam video display part of main ()

cv2.imshow("Capture", image)

Here

cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
cv2.imshow("Capture", image)

Just do this.

Try out

When you execute it, a red frame will appear in the center of the webcam like this, so let's put the number in the frame and press the E key.

ca65d9ec37ef3523fd80340fc1235bfa.png

However, I don't think it will be judged as 2!

Looking at what happens to the image during preprocessing,

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)###State during preprocessing###
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

img.jpg 前処理中の状態

It seems that the extraction of the number part does not go well due to the dark background.

Try setting a threshold and extracting only the dark black part.

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)###Added processing by threshold###
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

Reference [Image Threshold Processing — OpenCV-Python Tutorials 1 documentation](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_thresholding/py_thresholding. html)

By adding threshold processing, the number part can be extracted and judged well.

img.jpg前処理中の状態(閾値処理追加後)

d99b548adc6fb7541dbda4e6726280e2.png

I wrote other numbers and tried them, but some of them were difficult to read without adjusting the position and size. It may be interesting to think about how we can do better in the next step.

Also, I think that the range of play will be expanded by linking with a Web camera, so I hope it will give you an opportunity to try something.

Finally, I will post the entire code of what I made this time.

#!/usr/bin/python
#coding: utf-8

import cv2
import numpy as np
from chainer import Chain, serializers
import chainer.functions  as F
import chainer.links as L

#Multilayer perceptron model settings
class MyMLP(Chain):
    #Input 784, middle layer 500, output 10 dimensions
    def __init__(self, n_in=784, n_units=500, n_out=10):
        super(MyMLP, self).__init__(
            l1=L.Linear(n_in, n_units),
            l2=L.Linear(n_units, n_units),
            l3=L.Linear(n_units, n_out),
        )
    #Neural network structure
    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

def main():
    #Loading trained model
    net = MyMLP()
    serializers.load_npz('my.model2', net)
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            img = preprocessing(image)
            num = net(img)
            #cv2.imwrite("img.jpg ",img)
            print(num.data)
            print(np.argmax(num.data))
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Recommended Posts

Determine the numbers in the image taken with the webcam
[Python] Get the numbers in the graph image with OCR
Image display taken with the built-in ISIGHT
Detect folders with the same image in ImageHash
Convert the image in .zip to PDF with Python
Determine prime numbers with python
Tweet with image in Python
Testing with random numbers in Python
Detect mosaic points in the image
Try blurring the image with opencv2
I tried to process the image in "sketch style" with OpenCV
I tried to process the image in "pencil style" with OpenCV
Behavior when returning in the with block
Display Python 3 in the browser with MAMP
Right-click the image → realize "Compress with TinyPNG"
I tried playing with the image with Pillow
Easy image processing in Python with Pillow
Arrange the numbers in a spiral shape
Extract the color of the object in the image with Mask R-CNN and K-Means clustering
How to display in the entire window when setting the background image with tkinter
I tried "smoothing" the image with Python + OpenCV
Log in to the remote server with SSH
[Python] Get the files in a folder with Python
Load the network modeled with Rhinoceros in Python ③
Crop the image to rounded corners with pythonista
I tried "differentiating" the image with Python + OpenCV
What is wheezy in the Docker Python image?
[Automation] Extract the table in PDF with Python
I tried "binarizing" the image with Python + OpenCV
Create an image with characters in python (Japanese)
Load the network modeled with Rhinoceros in Python ②
[Python] Determine the type of iris with SVM
Load the network modeled with Rhinoceros in Python ①
The story that fits in with pip installation
Display the image after Data Augmentation with Pytorch
[Note] How to write QR code and description in the same image with python
Solved the problem that the image was not displayed in ROMol when loaded with PandasTools.LoadSDF.
Extract the table of image files with OneDrive & Python
Change the time zone with Docker in Oracle Database
Complement the library you put in anaconda with jedi-vim
Display images taken with the Raspberry Pi camera module
Implement Sign In With Google on the backend side
Determine the threshold using the P tile method in python
Identify the name from the flower image with keras (tensorflow)
Crawl the URL contained in the twitter tweet with python
Get the result in dict format with Python psycopg2
Read the linked list in csv format with graph-tool
Display line numbers in vim editor (with default settings)
Write letters in the card illustration with OpenCV python
Try loading the image in a separate thread (OpenCV-Python)
Load the module with the same name in another location
Determine if an attribute is defined in the object
Visualize fluctuations in numbers on your website with Datadog
POST the image with json and receive it with flask
Participated in the first ISUCON with the team "Lunch" # ISUCON10 Qualifying
Python OpenCV tried to display the image in text.
Shuffle the images in any directory with Python and save them in another folder with serial numbers.
When reading an image with SimpleITK, there is a problem if there is Japanese in the path