I tried to detect Mario with pytorch + yolov3

Background In [OpenCV] [C ++] I tried to detect multiple using template matching, I used OpenCV's Template Matching to detect Goomba. I scan the prepared image of Goomba from top to bottom to calculate the area with similar shape, but it seems that the upper cloud is similar to Goomba, and when the cloud and Goomba come out together, the cloud The one was detected first. It seems that it is possible to roughly trim areas with high similarity and then use the template image and histogram or background subtraction to make a judgment, but this time I changed my mind and tried object detection using yolov3. I will try.

Device

Environment I thought that I installed the necessary libraries, and when I looked at pytorch official, the story was proceeding based on anaconda. Until now, I used to create a virtual environment with python3 -m venv [envname] and install the necessary packages in it using pip, but Deep Learning libraries have dependencies on other packages. It seems quite likely, so I will use anaconda.

conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch

The environment of cuda is still 10.0, but I could learn without any problem even if I set the version of cudatoolkit to 10.1. If the settings around the GPU are slightly different, an error will occur at the stage of operation, so if it is true, it is better to match.

pytorch + yolov3 Use PyTorch-YOLOv3. git clone to download the dataset and weight file. It's coco, so it's quite heavy.

git clone https://github.com/eriklindernoren/PyTorch-YOLOv3
cd PyTorch-YOLOv3/
sudo pip3 install -r requirements.txt

cd weights/
bash download_weights.sh

cd data/
bash get_coco_dataset.sh

Then check if it works.

python3 test.py --weights_path weights/yolov3.weights

If that doesn't seem to be a problem, check if the object can be detected with the default settings. Store the image you want to detect in data/samples and execute it with detect.py.

python3 detect.py --image_folder data/samples/

A output folder is created directly under the work folder, so if you look at the contents, the image marked by object detection is stored.

Annotation By default there is a trained weight file that uses coco as the dataset and can detect 80 objects. This time, I will write a machine learning method when there is an object that the individual developer wants to detect.

As a flow, prepare 100 images with the target object, Annotate (annotate, label) what object is in which place in each image, and finally execute with train.py And learn.

The data set uses the video of Super Mario Bros (NES) Level 1-1 as written in Background. I'm worried about what data to use for machine learning, but one conclusion is that it seems better to use a game video that seems to be copyrightable. This is because fixed characters that are more fixed in the game than the images taken in reality appear repeatedly, and there are only a few patterns in the shape, so I think it's easy to get learning results immediately. Also, since it is a video, it can be captured in about 30 to 60 images per second, so it can be easily acquired.

So, use the labelme tool to annotate.

Since the work PC is a Mac, install Qt5 with homebrew and install the main body with pip.

brew install pyqt
pip install labelme

Before starting labelme, prepare the training data. This time, I extracted the 23-28 second video of Super Mario Bros (NES) Level 1-1 and put it in the serial number image by the following method.

Training data

output.gif

wget https://raw.githubusercontent.com/wkentaro/dotfiles/f3c5ad1f47834818d4f123c36ed59a5943709518/local/bin/video_to_images
pip install imageio imageio-ffmpeg tqdm
python video_to_images your_video.mp4

When executed, a your_video folder will be created and the serial number images will be stored in it.

Now that the training data is ready, specify the folder path and start labelme.

labelme ./your_video

After that, specify the label and area of ​​the object. There were about 150 sheets. When learning with yolov3, you only need to specify the range with bounding box and rectangle, but since you may handle DNN to be learned with segmentation later, you uselessly specify the area: sweat_smile: スクリーンショット 2021-01-14 14.18.22.png

The label and area attached to the object are generated as a json file in the folder where the image is loaded.

When you're done, keep it in a compressed file.

tar -zcvf output.tar.gz ./your_video

Training

Now that the training data is ready, let's try machine learning.

First, create a config file. Here, if you include __ignore__, the number of classes is 3, so set it with <num-classes> = 3. After execution, config/yolov3-custom.cfg will be output.

cd config/                                
bash create_custom_model.sh <num-classes> 

Open config/custom.data and set the number of classes.

config/custom.data


classes=3
train=data/custom/train.txt
valid=data/custom/valid.txt
names=data/custom/classes.names

Open data/custom/classes.names to list the object names. I think that only train is written by default.

data/custom/classes.names


__ignore__
mario
kuribo

Next, place the compressed training data folder directly under it and decompress it.

tar -zxvf output.tar.gz

Move the images from the unzipped folder to data/custom/images /. By default, there is a train image in data/custom/images /, so delete it.

rm data/custom/images/train.jpg
mv ./your_video/*.jpg data/custom/images/

Next, based on the json file in which the label information is written, rewrite it as [class ID] [object center coordinate x] [object center coordinate y] [object width] [object height] .. At this time, the ratio [0,1.0] to the entire image is output instead of the original data. The rewritten file will be generated in data/custom/labels /.

import os
import json
import numpy as np

def treat(filepath, classes):
  
  with open(filepath, "r") as fin:
    src = json.load(fin)

    dst = []
    for item in src["shapes"]:
      txt = item["label"]
      #Calculate the average value of each coordinate
      cx, cy = np.mean(np.array(item["points"]), axis=0)
      #1 for the total length of the image.Calculate the ratio when 0 is set
      cx_norm = cx / src["imageWidth"]
      cy_norm = cy / src["imageHeight"]
   
      #Calculate the width and height of the object
      min_x, min_y = np.min(np.array(item["points"]), axis=0)
      max_x, max_y = np.max(np.array(item["points"]), axis=0)
      rect_width = (max_x - min_x) / src["imageWidth"]
      rect_height = (max_y - min_y) / src["imageHeight"]

      #Search for class ID
      idx = list(filter(lambda x: x[1] == txt, classes))[0][0]  

      #Arrange and format
      dst.append([idx, cx_norm, cy_norm, rect_width, rect_height])
    return dst 

Finally, write the file paths stored in data/custom/images / to data/custom/train.txt (for training) and data/custom/valid.txt (for evaluation) respectively. I think the ratio is just right: (for training): (for evaluation) = 8: 2.

train.txt


data/custom/images/00000000.jpg
data/custom/images/00000001.jpg
data/custom/images/00000002.jpg
data/custom/images/00000003.jpg
data/custom/images/00000004.jpg
data/custom/images/00000005.jpg
data/custom/images/00000006.jpg
data/custom/images/00000007.jpg
data/custom/images/00000008.jpg
data/custom/images/00000009.jpg
data/custom/images/00000010.jpg
data/custom/images/00000011.jpg
data/custom/images/00000012.jpg
data/custom/images/00000013.jpg
data/custom/images/00000014.jpg
data/custom/images/00000015.jpg
...

Now that the settings are complete, run train.py.

python3 train.py \
--model_def config/yolov3-custom.cfg \
--data_config config/custom.data \
--batch_size 2 \
--img_size 32 \
--epochs 200 \
--pretrained_weights weights/darknet53.conv.74

The default for batch_size is 8, and img_size is 416, but if the machine performance is weak, a memory error will occur. My PC was out because the GPU memory is only 4GB. In that case, lowering the value will allow the learning to proceed normally.

As for the learning result, yolov3_ckpt_ {number of epochs} .pth is output to checkpoints/for each epoch.

Detect Objects

First, prepare the image data for object detection. Here, we will use the video frame from the start to the goal of Super Mario Bros (NES) Level 1-1. To convert a video to a serial number image, use the video_to_images script in the same way as you created with the training data. We were able to acquire a total of 1501 images.

So, let's try detection using detect.py.

python3 detect.py --image_folder ./data/mario_1-1/ \
--weights_path ./checkpoints/yolov3_ckpt_199.pth \
--model_def config/yolov3-custom.cfg \
--class_path data/custom/classes.names

Specify the path of the folder containing the image you want to test with --image_folder, and use the file generated by learning for --weights_path. The result is stored in the output file.

(239) Image: './data/mario_1-1/00000239.jpg'
	+ Label: mario, Conf: 0.99997
...

00000239.png

After that, convert the serial number image to a video and you're done. At first, I converted it with ffmpeg as ffmpeg -r 30 -i% 8d.png -vcodec libx264 -pix_fmt yuv420p -r 60 out.mp4, but the image quality was extremely degraded. (See Generate video from serial number image with ffmpeg / Generate serial number image from video ~ Prevent frame dropping ~)

So I converted it to a video using OpenCV.

import cv2
import os

def main():
	
	is_png = lambda x : os.path.splitext(x)[1] == ".png "
	imgs = list(filter(is_png, os.listdir()))

	imgs.sort()

	width = 480
	height = 270
	fps = 30

	fmt = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
	writer = cv2.VideoWriter('output.mp4', fmt, fps, (width, height))
	
	#resize
	for img in imgs:
		mat = cv2.imread(img)
		dst = cv2.resize(mat, dsize=(width, height))
		writer.write(dst)

	writer.release()

if __name__ == "__main__":
	main()

Consequence

↓ Click to watch a video of youtube detecting from the start to the goal. detect mario

――I haven't learned Fire Mario, but I identify it as Mario because it looks similar. ――Since Chibi Mario has not learned, it may be mistaken for Goomba probably because of its similar height. ――I think that the accuracy will be even better if you target objects such as blocks, treasure chests, and koopa troopa.

Reference

Recommended Posts

I tried to detect Mario with pytorch + yolov3
I tried to implement CVAE with PyTorch
I tried to implement reading Dataset with PyTorch
I tried to detect motion quickly with OpenCV
I tried to detect an object with M2Det!
I tried to move Faster R-CNN quickly with pytorch
I tried to implement and learn DCGAN with PyTorch
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to implement SSD with PyTorch now (Dataset)
I tried to explain Pytorch dataset
I tried implementing DeepPose with PyTorch
I tried to classify MNIST by GNN (with PyTorch geometric)
I tried to implement SSD with PyTorch now (model edition)
I want to detect objects with OpenCV
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
[Introduction to Pytorch] I played with sinGAN ♬
I tried batch normalization with PyTorch (+ note)
I tried implementing DeepPose with PyTorch PartⅡ
I tried to solve TSP with QAOA
I tried to implement sentence classification by Self Attention with PyTorch
I tried to easily detect facial landmarks with python and dlib
I tried to predict next year with AI
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to integrate with Keras in TFv1.1
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to debug.
I tried to automate sushi making with python
I tried to predict Titanic survival with PyCaret
I tried to paste
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to learn the sin function with chainer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to create a table only with Django
I want to detect unauthorized login to facebook with Jubatus (1)
I tried to extract features with SIFT of OpenCV
I tried to read and save automatically with VOICEROID2 2
I tried to detect the iris from the camera image
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to build ML Pipeline with Cloud Composer
I tried to implement time series prediction with GBDT
I tried to uncover our darkness with Chatwork API
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python