After doing an example using MNIST in the introduction to deep learning, there may be people who want to do something applied but can't think of a good example.

This time, I would like to help those people, so I will try to make something that distinguishes the numbers reflected on the webcam.

It is assumed that you have tried MLP with chainer, and this article will not touch on the model part and how to make it learn.

Project webcam video

First, let's display the video from the webcam. It seems easy to do with OpenCV.

This time, I'm using Logitech's "HD Webcam C270".

#!/usr/bin/python
#coding: utf-8

import cv2

def main():
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

If you are asked if you want your security software to allow access to your webcam at runtime, allow it.

Reference article: Venus ☆ Channel: Get webcam image with Python version OpenCV

Get and process images

Rather than constantly discriminating numbers, I want to get the image at that time and process it when I press any key. In the end, I want to pass it to the number discrimination process, but for the time being, I'll just display a message to check the operation.

Please refer to the reference article for which key is assigned to which number.

#!/usr/bin/python
#coding: utf-8

import cv2

def main():
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            print("Execute processing")
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Reference article: List of highgui key codes --Wasting Note

Preprocessing of acquired image

I thought it would be big to pass the entire acquired image as input, so I would like to cut out the 100x100 part in the center.

First, check the size of the image acquired by the webcam.

if k == 101:
    print(image.shape)

We will change the processing part of the E key. In OpenCV, the image is a Numpy array, so you can know the length of the element by using hogehoge.shape. In this example, (480, 640, 3) is output, so you can see that the size is 480 (vertical) x 640 (horizontal).

Now that we know the size, the process of cutting out the center 100 x 100 is as follows. Let's save the image and see if the crop is working.

if k == 101:
    img = image[190:290,270:370]
    cv2.imwrite("img.jpg ",img)

All you have to do now is match this cropped image to the same input format as for MNIST. Specifically, the processing is as follows.

The webcam image is in color, so first make it grayscale.
Reduce the image to 28x28
Perform the same processing as for MNIST

Combine it with the cutout in the center part earlier and put it together in the preprocessing function.

import numpy as np

def preprocessing(img):
    #Cutout in the center
    img = img[190:290,270:370]
    #Conversion to grayscale
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    #Reduce image to 28x28
    img = cv2.resize(img, (28, 28))
    #Below, the same processing as during learning is performed.
    img = 255 - img
    img = img.astype(np.float32)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

MLP settings used to determine numbers

For this number reading, I will use the simple MLP that is also used in the introduction to chainer.

from chainer import Chain, serializers
import chainer.functions  as F
import chainer.links as L

#Multilayer perceptron model settings
class MyMLP(Chain):
    #Input 784, middle layer 500, output 10 dimensions
    def __init__(self, n_in=784, n_units=500, n_out=10):
        super(MyMLP, self).__init__(
            l1=L.Linear(n_in, n_units),
            l2=L.Linear(n_units, n_units),
            l3=L.Linear(n_units, n_out),
        )
    #Neural network structure
    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

Since we will determine the numbers multiple times, we will first load the trained model (my.model2 in this example).

Added on July 10, 2017 This time, I am using the result (my.model2) of training the MLP defined above using MNIST data. If the reader uses the trained data prepared by himself / herself, it will be smooth to rewrite the contents of class MyMLP to the same one as when it was used for learning.

Then, when the E key is pressed, let's add a process to determine the number using the trained model and display the result.

def main():
    #Loading trained model
    net = MyMLP()###Additional part###
    serializers.load_npz('my.model2', net)###Additional part###
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            img = preprocessing(image)
            num = net(img)###Additional part###
            print(num.data)###Additional part###
            print(np.argmax(num.data))###Additional part###
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()

You should now have everything you need.

Display of the part to be cut out

There is no problem in processing, but it is convenient to know where the cutout part is when operating the webcam. Therefore, the part to be cut out in the webcam image is shown in the red frame.

Webcam video display part of main ()

cv2.imshow("Capture", image)

Here

cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
cv2.imshow("Capture", image)

Just do this.

Try out

When you execute it, a red frame will appear in the center of the webcam like this, so let's put the number in the frame and press the E key.

However, I don't think it will be judged as 2!

Looking at what happens to the image during preprocessing,

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)###State during preprocessing###
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

前処理中の状態

It seems that the extraction of the number part does not go well due to the dark background.

Try setting a threshold and extracting only the dark black part.

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)###Added processing by threshold###
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

Reference [Image Threshold Processing — OpenCV-Python Tutorials 1 documentation](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_thresholding/py_thresholding. html)

By adding threshold processing, the number part can be extracted and judged well.

前処理中の状態(閾値処理追加後)

I wrote other numbers and tried them, but some of them were difficult to read without adjusting the position and size. It may be interesting to think about how we can do better in the next step.

Also, I think that the range of play will be expanded by linking with a Web camera, so I hope it will give you an opportunity to try something.

Finally, I will post the entire code of what I made this time.

#!/usr/bin/python
#coding: utf-8

import cv2
import numpy as np
from chainer import Chain, serializers
import chainer.functions  as F
import chainer.links as L

#Multilayer perceptron model settings
class MyMLP(Chain):
    #Input 784, middle layer 500, output 10 dimensions
    def __init__(self, n_in=784, n_units=500, n_out=10):
        super(MyMLP, self).__init__(
            l1=L.Linear(n_in, n_units),
            l2=L.Linear(n_units, n_units),
            l3=L.Linear(n_units, n_out),
        )
    #Neural network structure
    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        y = self.l3(h2)
        return y

def preprocessing(img):
    img = img[190:290,270:370]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (28, 28))
    res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)
    img = 255 - img
    img = img.astype(np.float32)
    cv2.imwrite("img.jpg ",img)
    img /= 255
    img = np.array(img).reshape(1,784)
    return img

def main():
    #Loading trained model
    net = MyMLP()
    serializers.load_npz('my.model2', net)
    #Web camera image display
    capture = cv2.VideoCapture(0)
    if capture.isOpened() is False:
            raise("IO Error")
    while True:
        #Webcam video capture
        ret, image = capture.read()
        if ret == False:
            continue
        #Web camera image display
        cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
        cv2.imshow("Capture", image)
        k = cv2.waitKey(10)
        #Execute processing with E key
        if k == 101:
            img = preprocessing(image)
            num = net(img)
            #cv2.imwrite("img.jpg ",img)
            print(num.data)
            print(np.argmax(num.data))
        #Close the capture screen with the ESC key
        if  k == 27:
            break
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()