This article is the second day of the DoCoMo Advent Calendar.
Do you know the drawing quiz? This is a quiz where the questioner draws a picture and guesses what other people are drawing. There are many apps out there.
It seems that even one person can play by recognizing the image drawn by the user and guessing it, so following Last year, the machine learning library [PyTorch](https:: Let's create a combination of //pytorch.org/) and kivy, a library that allows you to create apps.
It has the most momentum and is easy to use in Python's GUI library (subjective).
There was a time when it was said that the threshold was high because there was no information in Japanese, but now there is insane information. Thanks to volunteers, there is also a Japanese document. This article is recommended as a tutorial for first-time users.
If you have the following functions, it seems that you can realize a drawing quiz.
--Display of the theme to be drawn --Button to change the theme --Canvas part --Erase button on canvas --Erase all --Erase the previous line (Undo) --Display of image recognition results
If you use kviewer, you can check the UI without the python file of the main body, so let's make only the layout first.
ui_test.kv
BoxLayout:
orientation: 'vertical' #Arrange four child objects vertically
BoxLayout:
size_hint_y: 0.1 #Vertical width 10%Occupy
Label: #Show theme
size_hint_x: 0.7 #Width 70%Occupy
text: 'Draw "XXX"'
Button: #Button to change the theme
size_hint_x: 0.3 #Width 30%Occupy
text: 'Change'
Label: #Canvas part (provisional)
text: 'CANVAS'
size_hint_y: 0.7 #Vertical width 70%Occupy
BoxLayout:
size_hint_y: 0.1 #Vertical width 10%Occupy
Button: #Undo button
text: 'Undo'
Button: #All erase button
text: 'Clear'
Label: #Display the result of AI hitting the picture
size_hint_y: 0.1 #Vertical width 10%Occupy
text: 'I guess it is "YYY"'
When displayed, the following screen appeared. The layout is now decided.
python -m kivy.tools.kviewer ui_test.kv
We need a model to guess the picture drawn by the user, but this time we will use the following trained model (MIT License). Instead of being light, there are only 20 categories, so if you want to recognize all 345 categories in the original dataset, please learn by yourself.
The model used this time does not use time series information (order in which lines are drawn), but since the original data set also has time series information, learning with this in mind will improve recognition accuracy during drawing. Should do. The folder structure is as follows. Use the two folders "src" and "trained_models" from the above GitHub repository.
├─paintquiz.py #Python file to be implemented
├─paintquiz.kv #Kivy file to be implemented
├─src #In the above repository
└─trained_models #In the above repository
We will implement the necessary functions in python, and change the kivy side a little accordingly. I was able to implement python with about 120 lines and kivy with about 60 lines, for a total of about 180 lines. The canvas is recognized every second and the provisional recognition result is displayed. The implemented functions are as follows.
--Functions of each button defined in the layout --Change button (change the theme) --Undo button (Erase the previous line) --Clear button (Erase all canvas) --Drawing function on canvas --Ability to recognize what is drawn on the canvas in real time
paintquiz.py
import random
from kivy.config import Config
#Window size at startup
Config.set('graphics', 'width', '600')
Config.set('graphics', 'height', '700')
from kivy.app import App
from kivy.clock import Clock
from kivy.properties import StringProperty
from kivy.uix.widget import Widget
from kivy.uix.boxlayout import BoxLayout
from kivy.uix.label import Label
from kivy.graphics import Line, Color
import numpy as np
import cv2
import torch
import torch.nn as nn
#List of recognizable classes
classes = ["apple", "book", "bowtie", "candle", "cloud", "cup", "door", "envelope", "eyeglasses", "guitar", "hammer",
"hat", "ice cream", "leaf", "scissors", "star", "t-shirt", "pants", "lightning", "tree"]
class Net():
#Load the recognition model
def __init__(self):
self.model = torch.load("./trained_models/whole_model_quickdraw", map_location=lambda storage, loc: storage) #Read the trained weight file
self.model.eval() #No learning this time
self.sm = nn.Softmax(dim=1) #If any result is unreliable, we want to cut off by score, so normalize with softmax
#Enter the image file name and return the recognition result
def predict(self, fn, th=.5):
image = cv2.imread(fn, cv2.IMREAD_UNCHANGED)[:,:,-1] #Get alpha channel and go to binary image
image = cv2.resize(image, (28, 28))
image = np.array(image, dtype=np.float32)[None, None, :, :]
image = torch.from_numpy(image)
pred = self.model(image)
pred = self.sm(pred)
return torch.max(pred[0]), torch.argmax(pred[0]) #Returns the recognition score and recognition class
class Paint(Widget):
pred_word = StringProperty() #Recognition result word
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.line_width = 10 #Line thickness
self.lines = [] #List to store lines for undo
self.in_drawing = False #Judge whether drawing is in progress
self.canvas.add(Color(0,0,0))
self.model = Net()
Clock.schedule_interval(self.predict, 1.0)
def calc_pos(self, bbox):
xmin = min(bbox[0], bbox[2])
ymin = min(bbox[1], bbox[3])
xmax = max(bbox[0], bbox[2])
ymax = max(bbox[1], bbox[3])
return xmin,ymin,xmax,ymax
#Operation during clicking (drawing)
def on_touch_move(self, touch):
if self.in_drawing == False:
if self.pos[0]<touch.x<self.pos[0]+self.size[0] and self.pos[1]<touch.y<self.pos[1]+self.size[1]:
self.in_drawing = True
with self.canvas:
touch.ud['line'] = Line(points=(touch.x, touch.y), width=self.line_width)
elif touch.ud:
if self.pos[0]<touch.x<self.pos[0]+self.size[0] and self.pos[1]<touch.y<self.pos[1]+self.size[1]:
touch.ud['line'].points += [touch.x, touch.y]
#Behavior at the end of the click
def on_touch_up(self, touch):
if self.in_drawing:
self.lines.append(touch.ud['line'])
self.in_drawing = False
#Erase the previous line
def undo(self):
if len(self.lines)>0:
line = self.lines.pop(-1)
self.canvas.remove(line)
#Erase all canvas
def clear_canvas(self):
for line in self.lines:
self.canvas.remove(line)
self.lines = []
#Recognize images every dt seconds
def predict(self, dt):
self.export_to_png('image.png')
with torch.no_grad():
score, label = self.model.predict('./image.png')
#Displayed as unknown when the recognition score is below a certain level
if score < 0.5:
self.pred_word = "CPU: I have no idea"
else:
self.pred_word = 'CPU: I guess it is "{}"'.format(classes[label].upper())
class PaintQuiz(BoxLayout):
word = StringProperty('Draw "{}"'.format(random.choice(classes).upper())) #Theme word
def __init__(self, **kwargs):
super(PaintQuiz, self).__init__(**kwargs)
pass
def reset(self):
self.word = 'Draw "{}"'.format(random.choice(classes).upper())
class PaintQuizApp(App):
def __init__(self, **kwargs):
super(PaintQuizApp, self).__init__(**kwargs)
self.title = 'PAINT QUIZ'
def build(self):
return PaintQuiz()
if __name__ == '__main__':
app = PaintQuizApp()
app.run()
paintquiz.kv
<PaintQuiz>:
canvas:
Color:
rgb: .9,.9,.9
Rectangle:
pos: self.pos
size: self.size
BoxLayout:
size: root.size
orientation: 'vertical'
BoxLayout:
size_hint_y: 0.1
orientation: 'horizontal'
Label:
canvas.before:
Color:
rgb: 1.,.3,.3
Rectangle:
pos: self.pos
size: self.size
size_hint_x: 0.7
text: root.word
font_size: 18
Button:
id: button_reset
size_hint_x: 0.3
text: 'Change'
on_release: root.reset()
Paint:
size_hint_y: 0.7
id: paint_area
allow_stretch: True
BoxLayout:
size_hint_y: 0.1
Button:
id: button_undo
text: 'Undo'
on_release: paint_area.undo()
Button:
id: button_clear
text: 'Clear'
on_release: paint_area.clear_canvas()
Label:
canvas.before:
Color:
rgb: 1.,.3,.3
Rectangle:
pos: self.pos
size: self.size
size_hint_y: 0.1
text: paint_area.pred_word
font_size: 18
The current theme is displayed in the upper left, and the recognition result (what you think you are drawing) is displayed below.
I drew apples and scissors, but they recognize it correctly. An apple looks like a hammer or a leaf when only the hem is drawn, but it is certainly recognized as a hammer or a leaf. Rather, it's the correct answer because the apples are leaves.
In the data analysis area, there are people who show the results with Jupyter Notebook instead of PowerPoint at the time of a meeting, but it is recommended because it is easy to make an application with that feeling and it is easy to get interested.
Actually, I wanted to make a picture-drawing app (which alternates with AI to draw pictures), but only for kivy experienced people! I gave up because it seemed to be an article. Please make someone and let me play.
Recommended Posts