I got Raspberry Pi 3B + and picamera for university classes. Since I'm free, I decided to let Raspberry Pi classify using deep learning. However, instead of classifying the photos taken in advance, the objects in the real-time image from picamera are classified and displayed in a nice way.

It may be at the student level, but I hope it will be helpful in part.

What I envisioned

I decided to create a "function that puts multiple personal belongings in the fixed field of view of the picamera and classifies them in real time and displays them **" in the Raspberry Pi.

Specifically, the object is extracted by ** background subtraction ** (a method of extracting the changed part from the background image), and deep learning is performed by ** PyTorch [PyTorch] ** (similar to Keras, TensorFlow). We will take a policy of classifying by.

** (* YOLO, SSD, etc. are not handled!) **

So I implemented it in the next step.

Step1 *: Prepare your own training data
Step2 *: Build and train a neural network
Step3 *: Implemented a mechanism to extract objects from picamera images and display the results classified by learned parameters.

Since the processing of Raspberry Pi is slow, I learned ** on my own PC and classified it on Raspberry Pi using the obtained parameter file **. So I put PyTorch on my PC and Raspberry Pi.

The following is a series of processes. Make a note of the areas where you struggled with the ** [⚠Note] ** symbol.

First preparation

Prepare the execution environment on the PC and Raspberry Pi.

Execution environment

The versions of the same package are different for PC and Raspberry Pi, but don't worry. Your own PC is for learning.

Own PC (Windows 10)

Python 3.6.4
PyTorch 1.4.0+cpu
Torchvision 0.5.0+cpu

** * Torchvision ** is a library used for ** image preprocessing and dataset creation ** in combination with PyTorch.

Raspberry Pi 3 Model B+ (Raspbian Stretch)

Python 3.5.3
PyTorch 1.3.0+cpu
Torchvision 0.5.0+cpu
OpenCV 3.4.7

I used ** Raspberry Pi Camera Module V2 ** as the camera to plug into the Raspberry Pi. I also put VNC Viewer on my PC and operated Raspberry Pi with ** SSH connection **.

How to build

Put the above version of the package on each computer. I will omit the details, but I referred to the link site area.

PyTorch / Torchvision

Install on ** PC ** by selecting the environment from PyTorch Official.

** [⚠Note] ** GPU cannot be used unless it is made by NVIDIA, so if you have "intel", select ** CUDA → None ** (normally use CPU).

For ** Raspberry Pi **, "PyTorch v1.3.0 in Raspberry Pi 3" and "PyTorch Deep Learning Framework in Raspberry Pi" How to build from " Thank you for your reference.

** [⚠Note] ** Specify the version as git clone ~~~ -b v1.3.0 etc. ** [⚠Note] ** In PyTorch 1.4.0, fatal error: immintrin.h does not exist, and the build stopped at about 80%. A mystery. (2020/3/20)

OpenCV

Please refer to "Installing OpenCV 3 on Raspberry Pi + Python 3 as easily as possible" and install it on ** Raspberry Pi **.

Both take a few hours to build ...

Implemented immediately

After a lot of trial and error, I just created a Python script.

* Step1 *: Create your own training data for images used for learning

I created image data of personal belongings to be used for learning. It is assumed that the picamera is inserted into the Raspberry Pi and ** fixed so that the picamera does not move **.

Program to create

After rotating the screen with the "r" key, press "p" to shoot the background without capturing anything. When you place the personal belongings you want to take and shoot again with "p", background subtraction is performed and the photo in the ** green frame ** is saved.

This time, I'm going to classify the three categories of ** "certain phone", "watch", and "wallet" **, so I will just take those three pictures.

`take_photo.py`


# coding: utf-8
import cv2
from datetime import datetime
import picamera
import picamera.array

MIN_LEN = 50  #Minimum length of one side of the object detection frame
GRAY_THR = 20  #Concentration change threshold
CUT_MODE = True  # True:Cut and save the detected object, False:Save the entire image as is


def imshow_rect(img, contour, minlen=0):
"""
Enclose all object detection points in the acquired image with a square frame
argument:
    img:Camera image
    contour:Contour
    minlen:Threshold for detection size (excluding areas where one side of the frame is shorter than this)
"""
    for pt in contour:
        x, y, w, h = cv2.boundingRect(pt)
        if w < minlen and h < minlen: continue
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
    cv2.imshow('Preview', img)


def save_cutimg(img, contour, minlen=0):
"""
Cut out and save all object detection points in the acquired image
argument:
Same as above
"""
    #Get the date and time and use it in the file name
    dt = datetime.now()
    f_name = '{}.jpg'.format(dt.strftime('%y%m%d%H%M%S'))
    imgs_cut = []
    for pt in contour:
        x, y, w, h = cv2.boundingRect(pt)
        if w < minlen and h < minlen: continue
        imgs_cut.append(img[y:y+h, x:x+w])

    #Cut out and save the object
    if not imgs_cut: return -1
    if len(imgs_cut) > 1:
        for i in range(len(imgs_cut)):
            cv2.imwrite(f_name[:-4]+'_'+str(i+1)+f_name[-4:], imgs_cut[i])
    else:
        cv2.imwrite(f_name, imgs_cut[0])
    return len(imgs_cut)


def save_img(img):
"""
Save the acquired image as it is
argument:
Same as above
"""
    dt = datetime.now()
    fname = '{}.jpg'.format(dt.strftime('%y%m%d%H%M%S'))
    cv2.imwrite(fname, img)


def take_photo():
"""
Background shooting->Object photography,Save
Key input: 
    "p":take a picture
    "q":Stop
    "r":Rotate the screen (when shooting the background)
    "i":Start over from the beginning (when shooting an object)
"""
    cnt = 0
    #Start picamera
    with picamera.PiCamera() as camera:
        camera.resolution = (480, 480)  #resolution
        camera.rotation = 0  #Camera rotation angle(Every time)
        #Start streaming
        with picamera.array.PiRGBArray(camera) as stream:
            print('Set background ... ', end='', flush=True)
            #First shoot the background
            while True:
                #Get and display streaming images
                camera.capture(stream, 'bgr', use_video_port=True)
                cv2.imshow('Preview', stream.array)

                wkey = cv2.waitKey(5) & 0xFF  #Key input reception

                stream.seek(0)  #2 new spells to capture
                stream.truncate()

                if wkey == ord('q'):
                    cv2.destroyAllWindows()
                    return print()
                elif wkey == ord('r'):
                    camera.rotation += 90
                elif wkey == ord('p'):
                    camera.exposure_mode = 'off'  #White balance fixed
                    save_img(stream.array)
                    #Grayscale and set as background image
                    back_gray = cv2.cvtColor(stream.array, 
                                             cv2.COLOR_BGR2GRAY)
                    print('done')
                    break

            #After setting the background,Shooting objects without moving the camera
            print('Take photos!')
            while True:
                camera.capture(stream, 'bgr', use_video_port=True)
                #Grayscale the current frame
                stream_gray = cv2.cvtColor(stream.array, 
                                           cv2.COLOR_BGR2GRAY)

                #Calculate the absolute value of the difference and binarize it,Mask making
                diff = cv2.absdiff(stream_gray, back_gray)
                mask = cv2.threshold(diff, GRAY_THR, 255, 
                                     cv2.THRESH_BINARY)[1]
                cv2.imshow('mask', mask)

                #Contour for object detection,Mask making
                contour = cv2.findContours(mask,
                                           cv2.RETR_EXTERNAL,
                                           cv2.CHAIN_APPROX_SIMPLE)[1]

                #All detected objects are enclosed in a square and displayed.
                stream_arr = stream.array.copy()
                imshow_rect(stream_arr, contour, MIN_LEN)

                wkey = cv2.waitKey(5) & 0xFF

                stream.seek(0)
                stream.truncate()

                if wkey == ord('q'):
                    cv2.destroyAllWindows()
                    return
                elif wkey == ord('i'):
                    break
                elif wkey == ord('p'):
                    if CUT_MODE:
                        num = save_cutimg(stream.array, contour, MIN_LEN)
                        if num > 0:
                            cnt += num
                            print('  Captured: {} (sum: {})'.format(num, cnt))
                    else:
                        save_img(stream.array)
                        cnt += 1
                        print('  Captured: 1 (sum: {})'.format(cnt))

    print('Initialized')
    take_photo()


if __name__ == '__main__':
    take_photo()

Run

I just take a picture. The cropped image for each green frame is saved like this.

➡ ＆＆

** [⚠Note] If there are too few pictures, it will not learn well. ** ** I took more than 50 photos for each class for training data, but I wonder if there are still few ... For the time being, various noises are added during learning, and the amount of data increases.

Put the photos in a folder and use Slack or something to ** move them to your PC **. (Semi-analog) Then, store the photos of each personal item in the folder structure below **. ** **

image_data
├─train
│  ├─phone
│  │       191227013419.jpg
│  │       191227013424.jpg
│  │              :
│  ├─wallet
│  │       191227013300.jpg
│  │       191227013308.jpg
│  │              :
│  └─watch
│          191227013345.jpg
│          191227013351.jpg
|                 :
└─val
    ├─phone
    │       191227013441.jpg
    │       191227013448.jpg
    |              :
    ├─wallet
    │       191227013323.jpg
    │       191227013327.jpg
    |              :
    └─watch
            191227013355.jpg
            191227013400.jpg
                   :

* Step2 *: Deep learning with PyTorch on PC

Build a network and train with the image above.

Program to create

When executed, the image is read from the previous folder and learning is started, and the progress file, loss and accuracy transition diagram, and final parameter file are output.

In creating it, I referred to "PyTorch Neural Network Implementation Handbook" (Shuwa System).

Even if you interrupt with "Ctrl + C", the learning progress up to that point is saved as ** "train_process.ckpt" **, and you can continue learning from the next execution. It is okay to change the hyperparameters on the way.

By the way, torchvsion's ** Image Folder ** creates a dataset with the folder name containing the photos as the class name. Easy! !! The photos in the train folder will be used for learning, and the photos in the val folder will be used for evaluation.

`train_net.py`


# coding: utf-8
import os
import re
import torch.nn as nn
import torch.optim as optim
import torch.utils
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

DATA_DIR = 'image_data'  #Image folder name
CKPT_PROCESS = 'train_process.ckpt'  #Learning progress save file name
CKPT_NET = 'trained_net.ckpt'  #Learned parameter file name
NUM_CLASSES = 3  #Number of classes
NUM_EPOCHS = 100  #Number of learning

#Hyperparameters that change often
LEARNING_RATE = 0.001  #Learning rate
MOMENTUM = 0.5  #inertia

checkpoint = {}  #Variable for saving progress


#Image data conversion definition (bulky)
#With the size of Resize,Related to first Linear input size of classifier
data_transforms = transforms.Compose([
    transforms.Resize((112, 112)),  #resize
    transforms.RandomRotation(30),  #Randomly rotate
    transforms.Grayscale(),  #Binarization
    transforms.ToTensor(),  #Tensorization
    transforms.Normalize(mean=[0.5], std=[0.5])  #Normalization (numbers are texto)
])

val_transforms = transforms.Compose([
    transforms.Resize((112, 112)),
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

#Data set creation
train_dataset = datasets.ImageFolder(
    root=os.path.join(DATA_DIR, 'train'),
    transform=train_transforms
)

val_dataset = datasets.ImageFolder(
    root=os.path.join(DATA_DIR, 'val'),
    transform=val_transforms
)

#Get mini batch
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=10,  #Batch size during learning
    shuffle=True  #Shuffle training data
)

val_loader = torch.utils.data.DataLoader(
    dataset=val_dataset,
    batch_size=10,
    shuffle=True
)


class NeuralNet(nn.Module):
    """Network definition. nn.Module inheritance"""
    def __init__(self, num_classes):
        super(NeuralNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(8, 16, kernel_size=5, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(400, 200),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(200, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


def main():
    """Data reading during training->Learning(->Saving data during training)->Illustration of results"""
    global checkpoint
    print('[Settings]')
    #Device settings
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    #network,Evaluation function,Optimization function settings
    net = NeuralNet(NUM_CLASSES).to(device)
    criterion = nn.CrossEntropyLoss()  #Evaluation function
    optimizer = optim.SGD(  #Optimization algorithm
        net.parameters(),
        lr=LEARNING_RATE,
        momentum=MOMENTUM,
        weight_decay=5e-4
    )

    #View settings
    # print('  Device               :', device)
    # print('  Dataset Class-Index  :', train_dataset.class_to_idx)
    # print('  Network Model        :', re.findall('(.*)\(', str(net))[0])
    # print('  Criterion            :', re.findall('(.*)\(', str(criterion))[0])
    # print('  Optimizer            :', re.findall('(.*)\(', str(optimizer))[0])
    # print('    -Learning Rate     :', LEARNING_RATE)
    # print('    -Momentum          :', MOMENTUM)

    t_loss_list = []
    t_acc_list = []
    v_loss_list = []
    v_acc_list = []
    epoch_pre = -1

    #Training (on the way) data acquisition
    if os.path.isfile(CKPT_PROCESS):
        checkpoint = torch.load(CKPT_PROCESS)
        net.load_state_dict(checkpoint['net'])
        optimizer.load_state_dict(checkpoint['optimizer'])
        t_loss_list = checkpoint['t_loss_list']
        t_acc_list = checkpoint['t_acc_list']
        v_loss_list = checkpoint['v_loss_list']
        v_acc_list = checkpoint['v_acc_list']
        epoch_pre = checkpoint['epoch']
        print("Progress until last time = {}/{} epochs"\
              .format(epoch_pre+1, NUM_EPOCHS))

    print('[Main process]')
    for epoch in range(epoch_pre+1, NUM_EPOCHS):
        t_loss, t_acc, v_loss, v_acc = 0, 0, 0, 0

        #Learning---------------------------------------------------------
        net.train()  #Learning mode
        for _, (images, labels) in enumerate(train_loader):
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(outputs, labels)
            t_loss += loss.item()
            t_acc += (outputs.max(1)[1] == labels).sum().item()
            loss.backward()
            optimizer.step()
        avg_t_loss = t_loss / len(train_loader.dataset)
        avg_t_acc = t_acc / len(train_loader.dataset)

        #Evaluation---------------------------------------------------------
        net.eval()  #Evaluation mode
        with torch.no_grad():  #Stop updating the gradient
            for images, labels in val_loader:
                images, labels = images.to(device), labels.to(device)
                images = images.to(device)
                labels = labels.to(device)
                outputs = net(images)
                loss = criterion(outputs, labels)
                v_loss += loss.item()
                v_acc += (outputs.max(1)[1] == labels).sum().item()
        avg_v_loss = v_loss / len(val_loader.dataset)
        avg_v_acc = v_acc / len(val_loader.dataset)
        # --------------------------------------------------------------
        print('\rEpoch [{}/{}] | Train [oss:{:.3f}, acc:{:.3f}] | Val [loss:{:.3f}, acc:{:.3f}]'\
              .format(epoch+1, NUM_EPOCHS, avg_t_loss, avg_t_acc, avg_v_loss, avg_v_acc), end='')

        #loss,Accuracy record
        t_loss_list.append(avg_t_loss)
        t_acc_list.append(avg_t_acc)
        v_loss_list.append(avg_v_loss)
        v_acc_list.append(avg_v_acc)

        #Process for saving progress
        checkpoint['net'] = net.state_dict()
        checkpoint['optimizer'] = optimizer.state_dict()
        checkpoint['t_loss_list'] = t_loss_list
        checkpoint['t_acc_list'] = t_acc_list
        checkpoint['v_loss_list'] = v_loss_list
        checkpoint['v_acc_list'] = v_acc_list
        checkpoint['epoch'] = epoch

    graph()
    save_process()
    save_net()


def save_process():
    """Save progress"""
    global checkpoint
    if not checkpoint: return
    torch.save(checkpoint, CKPT_PROCESS)


def save_net():
    """Save only network information"""
    global checkpoint
    if not checkpoint: return
    torch.save(checkpoint['net'], CKPT_NET)


def graph():
    """loss,Graphing accuracy"""
    global checkpoint
    if not checkpoint: return
    t_loss_list = checkpoint['t_loss_list']
    t_acc_list = checkpoint['t_acc_list']
    v_loss_list = checkpoint['v_loss_list']
    v_acc_list = checkpoint['v_acc_list']

    plt.figure(figsize=(10, 4))
    plt.subplot(1, 2, 1)
    plt.plot(range(len(t_loss_list)), t_loss_list,
             color='blue', linestyle='-', label='t_loss')
    plt.plot(range(len(v_loss_list)), v_loss_list,
             color='green', linestyle='--', label='v_loss')
    plt.legend()
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.title('Training and validation loss')
    plt.grid()

    plt.subplot(1, 2, 2)
    plt.plot(range(len(t_acc_list)), t_acc_list,
             color='blue', linestyle='-', label='t_acc')
    plt.plot(range(len(v_acc_list)), v_acc_list,
             color='green', linestyle='--', label='v_acc')
    plt.legend()
    plt.xlabel('epoch')
    plt.ylabel('acc')
    plt.title('Training and validation accuracy')
    plt.grid()
    plt.show()


if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt:
        print()
        graph()
        save_process()

** [⚠Note] The scale of the network is moderate. ** ** If you increase the number of layers and nodes too much, you will get the error DefaultCPUAllocator: can't allocate memory: you tried to allocate 685198800 bytes. If you later classify by Raspberry Pi, will a huge amount of parameters consume memory? ..

Run

Click here for the progress of learning. The left is the loss and the right is the accuracy. The blue line is for training data and the green dashed line is for verification data. The accuracy of the verification data is about 72%. There is room for improvement ...

When you finish learning, you will have a ** "" trained_net.ckpt "" ** file that stores only the trained parameters, so send it to Raspberry Pi again with Slack or something **.

* Step3 *: Real-time classification of camera images with Raspberry Pi and display of results

As a goal, the objects in the camera image are classified in real time and displayed in a nice way.

Program to create

First shoot the background, then divide the background from the frame, cut out the emerging object and make it into a 4D tensor batch through defined pre-processing. The entire batch is passed through the network, converted to the probability of each class, and the class (name of the object) with the highest probability is overlaid and displayed in the window.

Load the "trained_net.ckpt" created earlier.

** [⚠Note] If you do not set an upper limit on the batch size (the number of objects to be detected at one time), the Raspberry Pi may freeze when trying to process a large amount of detected areas at once. ** **

`raltime_classification.py`


# coding: utf-8
import os
from PIL import Image
from time import sleep
import cv2
import picamera
import picamera.array
import torch
#In the pytorch directory"export OMP_NUM_THREADS=1 or 2 or 3"Mandatory(The default is 4)
#The number of parallel processing cores"print(torch.__config__.parallel_info())"Confirm with
import torch.nn as nn
import torch.utils
from torchvision import transforms

CKPT_NET = 'trained_net.ckpt'  #Trained parameter file
OBJ_NAMES = ['Phone', 'Wallet', 'Watch']  #Display name of each class
MIN_LEN = 50
GRAY_THR = 20
CONTOUR_COUNT_MAX = 3  #Batch size(Number of objects to detect at one time)Upper limit of
SHOW_COLOR = (255, 191, 0)  #Frame color(B,G,R)

NUM_CLASSES = 3
PIXEL_LEN = 112  #Size after resize(1 side)
CHANNELS = 1  #Number of color channels(BGR:3,grayscale:1)


#Image data conversion definition
#With Resize,Related to the first Linear input of classifier
data_transforms = transforms.Compose([
    transforms.Resize((PIXEL_LEN, PIXEL_LEN)),
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])


class NeuralNet(nn.Module):
    """Network definition.Must be the same as the one used for learning"""
    def __init__(self, num_classes):
        super(NeuralNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(8, 16, kernel_size=5, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(400, 200),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(200, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


def detect_obj(back, target):
    """
With OpenCV background subtraction processing,Create a tuple of detected objects
argument:
        back:Input background image
Color image
        target:Image for background subtraction
Color image.Cut out multiple objects,Collect in color image tuples
    """
    print('Detecting objects ...')
    #Binarization
    b_gray = cv2.cvtColor(back, cv2.COLOR_BGR2GRAY)
    t_gray = cv2.cvtColor(target, cv2.COLOR_BGR2GRAY)
    #Calculate the difference
    diff = cv2.absdiff(t_gray, b_gray)

    #Contour according to threshold,Create a mask,Extract the object
    #The index of findContours is, cv2.__version__ == 4.2.0->[0], 3.4.7->[1]
    mask = cv2.threshold(diff, GRAY_THR, 255, cv2.THRESH_BINARY)[1]
    cv2.imshow('mask', mask)
    contour = cv2.findContours(mask,
                               cv2.RETR_EXTERNAL,
                               cv2.CHAIN_APPROX_SIMPLE)[1]

    #Coordinates of the change area detected above a certain height and width,Create size batch
    pt_list = list(filter(
        lambda x: x[2] > MIN_LEN and x[3] > MIN_LEN,
        [cv2.boundingRect(pt) for pt in contour]
    ))[:CONTOUR_COUNT_MAX]

    #Cut out the frame according to the position information,Convert to tuple of PIL image and return
    obj_imgaes = tuple(map(
        lambda x: Image.fromarray(target[x[1]:x[1]+x[3], x[0]:x[0]+x[2]]),
        pt_list
    ))
    return (obj_imgaes, pt_list)


def batch_maker(tuple_images, transform):
    """
Transform tuples of PIL format images,Convert to a tensor batch that can be processed on the network
argument:
        tuple_images:PIL image tuple
        transform:torchvision image conversion definition
    """
    return torch.cat([transform(img) for img
                      in tuple_images]).view(-1, CHANNELS, PIXEL_LEN, PIXEL_LEN)


def judge_what(img, probs_list, pos_list):
    """
Determine the object from the probability of belonging to each class,Display frame and name at that position,Returns the index of the class
argument:
        probs_list:Secondary array of probabilities.Batch format
        pos_list:Secondary array of positions.Batch format
    """
    print('Judging objects ...')
    #Convert to a list of the highest probabilities and their indexes
    ip_list = list(map(lambda x: max(enumerate(x), key = lambda y:y[1]),
                       F.softmax(probs_list, dim=-1)))  # <- 4/30 fixes

    #Convert index to object name,Write and display the object name and certainty at the position of the object
    for (idx, prob), pos in zip(ip_list, pos_list):
        cv2.rectangle(img, (pos[0], pos[1]), (pos[0]+pos[2], pos[1]+pos[3]), SHOW_COLOR, 2)
        cv2.putText(img, '%s:%.1f%%'%(OBJ_NAMES[idx], prob*100), (pos[0]+5, pos[1]+20),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.8, SHOW_COLOR, thickness=2)
    return ip_list


def realtime_classify():
    """Trained model loading->Read test data->Classification->Display the result overlaid on the image"""
    #Device settings
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    #network settings
    net = NeuralNet(NUM_CLASSES).to(device)

    #Trained data acquisition
    if os.path.isfile(CKPT_NET):
        checkpoint = torch.load(CKPT_NET)
        net.load_state_dict(checkpoint)
    else:
        raise FileNotFoundError('No trained network file: {}'.format(CKPT_NET))

    #Evaluation mode
    net.eval()
    #Start picamera
    with picamera.PiCamera() as camera:
        camera.resolution = (480, 480)
        #Start streaming
        with picamera.array.PiRGBArray(camera) as stream:
            print('Setting background ...')
            sleep(2)
    
            camera.exposure_mode = 'off'  #White balance fixed
            camera.capture(stream, 'bgr', use_video_port=True)
            #Set as background
            img_back = stream.array

            stream.seek(0)
            stream.truncate()
            
            print('Start!')
            with torch.no_grad():
                while True:
                    camera.capture(stream, 'bgr', use_video_port=True)
                    #Background subtraction for future input images
                    img_target = stream.array
                    #Detects objects and their positions
                    obj_imgs, positions = detect_obj(img_back, img_target)
                    if obj_imgs:
                        #Convert detected object to network input format
                        obj_batch = batch_maker(obj_imgs, data_transforms)
                        #Classification
                        outputs = net(obj_batch)
                        #Judgment
                        result = judge_what(img_target, outputs, positions)
                        print('  Result:', result)

                    #display
                    cv2.imshow('detection', img_target)

                    if cv2.waitKey(200) == ord('q'):
                        cv2.destroyAllWindows()
                        return

                    stream.seek(0)
                    stream.truncate()


if __name__ == "__main__":
    try:
        realtime_classify()
    except KeyboardInterrupt:
        cv2.destroyAllWindows()

Run

Bring "trained_net.ckpt" to Raspberry Pi and execute it in the same directory. The name of the detected object and its certainty are displayed.

The execution result is ... I'm satisfied with the high-precision classification from the moment I put it! !!

➡ ➡

** [⚠Note] It is recommended to change the number of cores used for execution (default 4). ** ** There is a great risk of freezing when used with 4 cores full. In the pytorch directory, change the command to ʻexport OMP_NUM_THREADS = 2(using 2 cores). You can check the number of cores withprint (torch.config.parallel_info ()). However, closing the shell will discard the changes, so to make it persistent, under ... ~ fi at the bottom of **". Profile "** in / home / pi, ʻexport OMP_NUM_THREADS Write = 2 and reboot.

Summary

I was able to do what I wanted to do! (I'm sorry for the lack of readability ...) If you use OpenCV face detection, it seems that you can immediately apply it to very simple face recognition.

Originally I was thinking of implementing SSD, but I thought it would be difficult to create a dataset with location information, and I gave up because I could not solve the error that I got when trying to train with sample data. ..

Unlike SSDs, the disadvantage of this background subtraction is that overlapping objects cannot be separated and are judged to be one.

It was a good study ~

Real-time classification of multiple objects in the camera image with deep learning of Raspberry Pi 3 B + & PyTorch

What I envisioned

First preparation

Execution environment

Own PC (Windows 10)

How to build

Implemented immediately

* Step1 *: Create your own training data for images used for learning

Program to create

take_photo.py

Run

* Step2 *: Deep learning with PyTorch on PC

Program to create

train_net.py

Run

* Step3 *: Real-time classification of camera images with Raspberry Pi and display of results

Program to create

raltime_classification.py

Run

Summary

`take_photo.py`

`train_net.py`

`raltime_classification.py`