Create an image recognition application that recognizes the numbers written on the screen with Pytorch Mobile and kotlin. ** Create all the functions of the model and android for image recognition from scratch. ** ** It will be divided into two parts, ** CNN Network Creation (Python) ** and ** Android Implementation (kotlin) **.
If you are an android engineer who does not have a Python environment, or if you are having trouble creating a model, [Create an image recognition application that discriminates the numbers written on the screen with android (PyTorch Mobile) [Android implementation]](https://qiita. Please go to com / YS-BETA / items / 15a4a2c64360f91f8b3a) and download the trained model in the implementation section to proceed.
I have listed this python code on Github Github: https://github.com/SY-BETA/CNN_PyTorch
This ↓
Do 1 to 4.
Even save the model using python. The library used this time is PyTorch
The execution environment is jupyter notebook
Download the MNIST dataset to create and train a simple CNN model.
Download the handwritten digit dataset MNIST that everyone loves using torchvision
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose([
transforms.ToTensor()])
train = torchvision.datasets.MNIST(
root="data/train", train=True, transform=transform, target_transform=None, download=True)
test = torchvision.datasets.MNIST(
root="data/test", train=False, transform=transform, target_transform=None, download=True)
Let's see what kind of data set
from matplotlib import pyplot as plt
import numpy as np
print(train.data.size())
print(test.data.size())
img = train.data[0].numpy()
plt.imshow(img, cmap='gray')
print('Label:', train.targets[0])
Execution result
Change the number of MNIST color channels from 1 to 3.
** Why do you bother to waste such an increase in the amount of calculation? **-> When handling images on android, handle in bitmap format, when converting it to tensor with pytorch mobile ** Can only be converted to tensor with 3 channels **. (Is grayscale conversion added in the future or is it such a specification ...) So let's train the model by converting the data to RGB.
** Not limited to this time, the model used in PyTorch Mobile needs to be a model with 3 color channels. ** **
train_data_resized = train.data.numpy() #from torch tensor to numpy
test_data_resized = test.data.numpy()
train_data_resized = torch.FloatTensor(np.stack((train_data_resized,)*3, axis=1)) #Convert to RGB
test_data_resized = torch.FloatTensor(np.stack((test_data_resized,)*3, axis=1))
print(train_data_resized.size())
The size of the dataset has now changed from torch.Size ([60000, 28, 28])
to torch.Size ([60000, 3, 28, 28])
.
This time, the MNIST dataset cannot be used as it is due to the number of channels, so create a custom dataset by inheriting pytorch's Dataset
.
In addition, a standardization class, which is an image preprocessing, is also created here.
import torch.utils.data as data
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
#Image preprocessing
class ImgTransform():
def __init__(self):
self.transform = transforms.Compose([
transforms.ToTensor(), #Tensor conversion
transforms.Normalize(mean, std) #Standardization
])
def __call__(self, img):
return self.transform(img)
#Inherit Dataset class
class _3ChannelMnistDataset(data.Dataset):
def __init__(self, img_data, target, transform):
#[The number of data,height,side,Number of channels]To
self.data = img_data.numpy().transpose((0, 2, 3, 1)) /255
self.target = target
self.img_transform = transform #Instance of image preprocessing class
def __len__(self):
#Returns the number of images
return len(self.data)
def __getitem__(self, index):
#Image preprocessing(Standardization)Returns the data
img_transformed = self.img_transform(self.data[index])
return img_transformed, self.target[index]
Note that mean
and std
are the usual values that are often used for standardization, such as VGG16. This is the value at that time that is always standardized when converting to a tensor on android.
If you don't know the value, you can check ʻImageUtils` of pytroch mobile in android studio.
train_dataset = _3ChannelMnistDataset(train_data_resized, train.targets, transform=ImgTransform())
test_dataset = _3ChannelMnistDataset(test_data_resized, test.targets, transform=ImgTransform())
#Try testing the dataset
index = 0
print(train_dataset.__getitem__(index)[0].size())
print(train_dataset.__getitem__(index)[1])
print(train_dataset.__getitem__(index)[0][1]) #You can see that it is standardized properly
Create a custom data loader with the created dataset. Batch size is 100
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100, shuffle=False)
Create a simple network with 1 convolution layer and 3 fully connected layers. (I hate taking time to learn)
from torch import nn
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.relu = nn.ReLU()
self.pool = nn.MaxPool2d(3)
self.conv = nn.Conv2d(3, 10, kernel_size=4)
self.fc1 = nn.Linear(640, 300)
self.fc2 = nn.Linear(300, 100)
self.fc3 = nn.Linear(100, 10)
def forward(self, x):
x = self.conv(x)
x = self.relu(x)
x = self.pool(x)
x = x.view(x.size()[0], -1) #Vectorized matrix for linear processing(view(Height, width))
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
return x
model = Model()
print(model)
Such a network
import tqdm
from torch import optim
#Inference mode
def eval_net(net, data_loader, device="cpu"): #If you have a GPU, go to gpu
#Inference mode
net.eval()
ypreds = [] #Predicted label storage variable
for x, y in (data_loader):
#Transfer to device with to method
x = x.to(device)
y = [y.to(device)]
#Predict the class with the highest probability
#forward propagation
with torch.no_grad():
_, y_pred = net(x).max(1)
ypreds.append(y_pred)
#Prediction for each mini-batch into one tensor
y = torch.cat(y)
ypreds = torch.cat(ypreds)
#Calculate predicted value(Correct answer = sum of predictive elements)
acc = (y == ypreds).float().sum()/len(y)
return acc.item()
#Training mode
def train_net(net, train_loader, test_loader,optimizer_cls=optim.Adam,
loss_fn=nn.CrossEntropyLoss(),n_iter=3, device="cpu"):
train_losses = []
train_acc = []
eval_acc = []
optimizer = optimizer_cls(net.parameters())
for epoch in range(n_iter): #Turn 4 times
runnig_loss = 0.0
#In training mode
net.train()
n = 0
n_acc = 0
for i, (xx, yy) in tqdm.tqdm(enumerate(train_loader),
total=len(train_loader)):
xx = xx.to(device)
yy = yy.to(device)
output = net(xx)
loss = loss_fn(output, yy)
optimizer.zero_grad() #Initialize optimizer
loss.backward() #Loss function(Cross entropy error)From backpropagation
optimizer.step()
runnig_loss += loss.item()
n += len(xx)
_, y_pred = output.max(1)
n_acc += (yy == y_pred).float().sum().item()
train_losses.append(runnig_loss/i)
#Prediction accuracy of training data
train_acc.append(n_acc / n)
#Prediction accuracy of validation data
eval_acc.append(eval_net(net, test_loader, device))
#Show results with this epoch
print("epoch:",epoch, "train_loss:",train_losses[-1], "train_acc:",train_acc[-1],
"eval_acc:",eval_acc[-1], flush=True)
First try to infer without learning
eval_net(model, test_loader)
Since the seed value of the random parameter of the network is not fixed, it is not reproducible and changes randomly, but in my environment, the score before learning was 0.0799999982
.
Learning using the function created earlier
train_net(model, train_loader, test_loader)
Eventually, the prediction accuracy was about 0.98000001907
. Well, the accuracy is too high. I'm worried if the accuracy is too good ...
Put one data in the trained model and try to predict the label.
data = train_dataset.__getitem__(0)[0].reshape(1, 3, 28, 28) #Resize (note the size of the data loader)
print("label",train_dataset.__getitem__(0)[1].data)
model.eval()
output = model(data)
print(output.size())
output
Execution result It can be seen that the score with an index of 5 is the highest and can be predicted.
Finally, model creation and learning is complete! !!
Save the model for use on android
#Save model
model.eval()
#Sample input size
example = torch.rand(1, 3, 28, 28)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("./CNNModel.pt")
print(model)
For the time being, this is the end of [Network Creation] !! Next, we will implement the created model on android. When I converted it to a tensor with PyTorch Mobile, it became an RGB tensor, and I couldn't make it grayscale, so I had to bother to convert MNIST to RGB, which was a lot of troublesome processing. As a result, I couldn't use the MNIST dataset as it was, and I had to use my own dataset and data loader. Well, I think it can hardly be used at grayscale or commercial level. Also, although it was a properly made CNN network, I was surprised that the accuracy was unexpectedly high, as expected CNN I'll give you Github for the time being.
This code Github: https://github.com/SY-BETA/CNN_PyTorch
Trained model created this time (.py): https://github.com/SY-BETA/CNN_PyTorch/blob/master/CNNModel.pt
Let's go to Android implementation Create an image recognition application that discriminates the numbers written on the screen with android (PyTorch Mobile) [Android implementation]
Recommended Posts