After doing an example using MNIST in the introduction to deep learning, there may be people who want to do something applied but can't think of a good example.
This time, I would like to help those people, so I will try to make something that distinguishes the numbers reflected on the webcam.
First, let's display the video from the webcam. It seems easy to do with OpenCV.
This time, I'm using Logitech's "HD Webcam C270".
#!/usr/bin/python
#coding: utf-8
import cv2
def main():
#Web camera image display
capture = cv2.VideoCapture(0)
if capture.isOpened() is False:
raise("IO Error")
while True:
#Webcam video capture
ret, image = capture.read()
if ret == False:
continue
#Web camera image display
cv2.imshow("Capture", image)
k = cv2.waitKey(10)
#Close the capture screen with the ESC key
if k == 27:
break
cv2.destroyAllWindows()
if __name__ == '__main__':
main()
If you are asked if you want your security software to allow access to your webcam at runtime, allow it.
Reference article: Venus ☆ Channel: Get webcam image with Python version OpenCV
Rather than constantly discriminating numbers, I want to get the image at that time and process it when I press any key. In the end, I want to pass it to the number discrimination process, but for the time being, I'll just display a message to check the operation.
Please refer to the reference article for which key is assigned to which number.
#!/usr/bin/python
#coding: utf-8
import cv2
def main():
#Web camera image display
capture = cv2.VideoCapture(0)
if capture.isOpened() is False:
raise("IO Error")
while True:
#Webcam video capture
ret, image = capture.read()
if ret == False:
continue
#Web camera image display
cv2.imshow("Capture", image)
k = cv2.waitKey(10)
#Execute processing with E key
if k == 101:
print("Execute processing")
#Close the capture screen with the ESC key
if k == 27:
break
cv2.destroyAllWindows()
if __name__ == '__main__':
main()
Reference article: List of highgui key codes --Wasting Note
I thought it would be big to pass the entire acquired image as input, so I would like to cut out the 100x100 part in the center.
First, check the size of the image acquired by the webcam.
if k == 101:
print(image.shape)
We will change the processing part of the E key. In OpenCV, the image is a Numpy array, so you can know the length of the element by using hogehoge.shape. In this example, (480, 640, 3) is output, so you can see that the size is 480 (vertical) x 640 (horizontal).
Now that we know the size, the process of cutting out the center 100 x 100 is as follows. Let's save the image and see if the crop is working.
if k == 101:
img = image[190:290,270:370]
cv2.imwrite("img.jpg ",img)
All you have to do now is match this cropped image to the same input format as for MNIST. Specifically, the processing is as follows.
Combine it with the cutout in the center part earlier and put it together in the preprocessing function.
import numpy as np
def preprocessing(img):
#Cutout in the center
img = img[190:290,270:370]
#Conversion to grayscale
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#Reduce image to 28x28
img = cv2.resize(img, (28, 28))
#Below, the same processing as during learning is performed.
img = 255 - img
img = img.astype(np.float32)
img /= 255
img = np.array(img).reshape(1,784)
return img
For this number reading, I will use the simple MLP that is also used in the introduction to chainer.
from chainer import Chain, serializers
import chainer.functions as F
import chainer.links as L
#Multilayer perceptron model settings
class MyMLP(Chain):
#Input 784, middle layer 500, output 10 dimensions
def __init__(self, n_in=784, n_units=500, n_out=10):
super(MyMLP, self).__init__(
l1=L.Linear(n_in, n_units),
l2=L.Linear(n_units, n_units),
l3=L.Linear(n_units, n_out),
)
#Neural network structure
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
Since we will determine the numbers multiple times, we will first load the trained model (my.model2 in this example).
- Added on July 10, 2017 This time, I am using the result (my.model2) of training the MLP defined above using MNIST data. If the reader uses the trained data prepared by himself / herself, it will be smooth to rewrite the contents of class MyMLP to the same one as when it was used for learning.
Then, when the E key is pressed, let's add a process to determine the number using the trained model and display the result.
def main():
#Loading trained model
net = MyMLP()###Additional part###
serializers.load_npz('my.model2', net)###Additional part###
#Web camera image display
capture = cv2.VideoCapture(0)
if capture.isOpened() is False:
raise("IO Error")
while True:
#Webcam video capture
ret, image = capture.read()
if ret == False:
continue
#Web camera image display
cv2.imshow("Capture", image)
k = cv2.waitKey(10)
#Execute processing with E key
if k == 101:
img = preprocessing(image)
num = net(img)###Additional part###
print(num.data)###Additional part###
print(np.argmax(num.data))###Additional part###
#Close the capture screen with the ESC key
if k == 27:
break
cv2.destroyAllWindows()
You should now have everything you need.
There is no problem in processing, but it is convenient to know where the cutout part is when operating the webcam. Therefore, the part to be cut out in the webcam image is shown in the red frame.
Webcam video display part of main ()
cv2.imshow("Capture", image)
Here
cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
cv2.imshow("Capture", image)
Just do this.
When you execute it, a red frame will appear in the center of the webcam like this, so let's put the number in the frame and press the E key.
However, I don't think it will be judged as 2!
Looking at what happens to the image during preprocessing,
def preprocessing(img):
img = img[190:290,270:370]
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.GaussianBlur(img, (3, 3), 0)
img = cv2.resize(img, (28, 28))
img = 255 - img
img = img.astype(np.float32)
cv2.imwrite("img.jpg ",img)###State during preprocessing###
img /= 255
img = np.array(img).reshape(1,784)
return img
前処理中の状態
It seems that the extraction of the number part does not go well due to the dark background.
Try setting a threshold and extracting only the dark black part.
def preprocessing(img):
img = img[190:290,270:370]
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.GaussianBlur(img, (3, 3), 0)
img = cv2.resize(img, (28, 28))
res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)###Added processing by threshold###
img = 255 - img
img = img.astype(np.float32)
cv2.imwrite("img.jpg ",img)
img /= 255
img = np.array(img).reshape(1,784)
return img
Reference [Image Threshold Processing — OpenCV-Python Tutorials 1 documentation](http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_imgproc/py_thresholding/py_thresholding. html)
By adding threshold processing, the number part can be extracted and judged well.
前処理中の状態(閾値処理追加後)
I wrote other numbers and tried them, but some of them were difficult to read without adjusting the position and size. It may be interesting to think about how we can do better in the next step.
Also, I think that the range of play will be expanded by linking with a Web camera, so I hope it will give you an opportunity to try something.
Finally, I will post the entire code of what I made this time.
#!/usr/bin/python
#coding: utf-8
import cv2
import numpy as np
from chainer import Chain, serializers
import chainer.functions as F
import chainer.links as L
#Multilayer perceptron model settings
class MyMLP(Chain):
#Input 784, middle layer 500, output 10 dimensions
def __init__(self, n_in=784, n_units=500, n_out=10):
super(MyMLP, self).__init__(
l1=L.Linear(n_in, n_units),
l2=L.Linear(n_units, n_units),
l3=L.Linear(n_units, n_out),
)
#Neural network structure
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
y = self.l3(h2)
return y
def preprocessing(img):
img = img[190:290,270:370]
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.GaussianBlur(img, (3, 3), 0)
img = cv2.resize(img, (28, 28))
res, img = cv2.threshold(img, 70 , 255, cv2.THRESH_BINARY)
img = 255 - img
img = img.astype(np.float32)
cv2.imwrite("img.jpg ",img)
img /= 255
img = np.array(img).reshape(1,784)
return img
def main():
#Loading trained model
net = MyMLP()
serializers.load_npz('my.model2', net)
#Web camera image display
capture = cv2.VideoCapture(0)
if capture.isOpened() is False:
raise("IO Error")
while True:
#Webcam video capture
ret, image = capture.read()
if ret == False:
continue
#Web camera image display
cv2.rectangle(image,(270,190),(370,290),(0,0,255),3)
cv2.imshow("Capture", image)
k = cv2.waitKey(10)
#Execute processing with E key
if k == 101:
img = preprocessing(image)
num = net(img)
#cv2.imwrite("img.jpg ",img)
print(num.data)
print(np.argmax(num.data))
#Close the capture screen with the ESC key
if k == 27:
break
cv2.destroyAllWindows()
if __name__ == '__main__':
main()