The moon and soft-shelled turtle are similarly round, but the difference is so great that they cannot be compared. The parable that the two are so different. https://dictionary.goo.ne.jp/word/%E6%9C%88%E3%81%A8%E9%BC%88/
There seems to be a big difference, so let's try if you can recognize images using deep learning!
I'll also explain Pytorch (a little) as appropriate. (If you make a mistake, please correct it. Thank you.)
The code is here. https://github.com/kyasby/Tuki-Suppon.git
"Moon and soft-shelled turtle"
It seems to be similar and different.
pytorch's "torch vision.datasets.ImageFolder」
I made it because there weren't many articles that used torchvision.datasets.ImageFolderof
pytorch, which corresponds to keras from_from_directry
.
If you put an image in the folder, it will be labeled automatically. Convenient.
pytorch "torch.utils.data.random_split」
Thanks to this, there is no need to separate train and test when putting photos in a folder.
From google image ・ 67 images of soft-shelled turtle We have collected images that look like they were seen from above the shell. For example, an image like this. (Pii-san's soft-shelled turtle) http://photozou.jp/photo/show/235691/190390795
・ 70 images of the moon I have collected images of round moons. I cut it out by hand so that a large circle appears on the screen. For example, an image like this.
.
├── main.ipynb
├── pics
├── tuki
| |-tuki1.png
| |-tuki2.png
|
└── kame
|-kame1.png
|-kame2.png
Since the images are divided into directories, use torchvision.datasets.ImageFolde
to automatically label each directory.
import matplotlib.pyplot as plt
import numpy as np
import copy
import time
import os
from tqdm import tqdm
import torchvision.transforms as transforms
import torchvision.models as models
import torchvision
import torch.nn as nn
import torch
transform_dict = {
'train': transforms.Compose(
[transforms.Resize((256,256)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
]),
'test': transforms.Compose(
[transforms.Resize((256,256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])}
Create a pre-processing dictionary for train and test.
You can create a pre-processing sequence by using transforms.Compose
. It seems that they are processed in the order they were put in the arguments.
This time,
transforms.Resize(256, 256)
→ Resize the image to 256x256.
transforms.RandomHorizontalFlip()
→ Create an image that is flipped horizontally.
transforms.ToTensor()
→ PIL or numpy.ndarray ((height x width x channel) (0 ~ 255))
To
It converts to Tensor ((channel x height x width) (0.0 ~ 1.0)).
In PIL and numpy, the images are in the order of (height x width x channel), but in Pytorch, it should be noted that (channel x height x width). It seems that this order is easier to handle.
transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
→ Normalizes each GRB with the specified mean and standard deviation.
document https://pytorch.org/docs/stable/torchvision/transforms.html
# ex.
# data_folder = "./pics"
# transform = transform_dict["train"]
data = torchvision.datasets.ImageFolder(root=data_folder, transform=transform_dict[phase])
Create a dataset from the above directory.
# ex.
# train_ratio = 0.8
train_size = int(train_ratio * len(data))
# int()To an integer.
val_size = len(data) - train_size
data_size = {"train":train_size, "val":val_size}
# =>{"train": 112, "val": 28}
data_train, data_val = torch.utils.data.random_split(data, [train_size, val_size])
torch.utils.data.random_split(dataset, lengths)
Will divide the dataset ** randomly **, ** without cover **.
Of course, the dataset is the dataset
You can pass the number of datasets in a list to lengths.
I also stored the train and valid data sizes in the dictionary.
# ex.
# data_train => Subset(data, [4,5,1,7])
# data_val => Subset(data, [3,8,2,6])
There are as many return values as there are list lengths. Each return value contains a list of datasets and index numbers.
(What is a Subset?)
train_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(data_val, batch_size=batch_size, shuffle=False)
dataloaders = {"train":train_loader, "val":val_loader}
Create a data loader. Pytorch creates a data loader like this to load data. I also put this in the dictionary.
def imshow(img):
img = img / 2 + 0.5
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
#Randomly acquire training data
dataiter = iter(dataloaders["train"])
images, labels = dataiter.next()
#Image display
imshow(torchvision.utils.make_grid(images))
#Label display
print(' '.join('%5s' % labels[labels[j]] for j in range(8)))
It seems that the above code will display it like this. I got it from here. Https://qiita.com/kuto/items/0ff3ccb4e089d213871d
model = models.resnet18(pretrained=True)
for param in model.parameters():
print(param)
# => Parameter containing:
#tensor([[[[-1.0419e-02, -6.1356e-03, -1.8098e-03, ..., 5.6615e-02,
# 1.7083e-02, -1.2694e-02],
# ...
# -7.1195e-02, -6.6788e-02]]]], requires_grad=True)
The model uses ResNet18. By putting pretrained = True
in the argument, you can use the trained model.
Transfer learning is performed without learning existing parameters.
The weight displayed as requires_grad = True
is updated.
To prevent it from being updated, set as follows.
model
# => ResNet(
# (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
# (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# (relu): ReLU(inplace=True)
# (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
# (layer1): Sequential(
# (0): BasicBlock(
# (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
# (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# (relu): ReLU(inplace=True)
# (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
# (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# )
# ...
# (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
# (fc): Linear(in_features=512, out_features=1000, bias=True)
# )
I knew that the final layer was (fc)
, so
for p in model.parameters():
p.requires_grad=False
model.fc = nn.Linear(512, 2)
Extract all parameters with model.parameters ()
, set requires_grad = False
, and overwrite the final layer.
model = model.cuda() #If you don't have a GPU, you don't need this line.
lr = 1e-4
epoch = 40
optim = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss().cuda() #Without GPU.cuda()I don't need it.
If you want to use GPU, you need to send the model to GPU.
The model remains almost a tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
def train_model(model, criterion, optimizer, scheduler=None, num_epochs=25):
#:Returns a bool value.
use_gpu = torch.cuda.is_available()
#Start time
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
#Create a dictionary with a list for saving the progress.
loss_dict ={"train" : [], "val" : []}
acc_dict = {"train" : [], "val" : []}
for epoch in tqdm(range(num_epochs)):
if (epoch+1)%5 == 0:#The epoch is displayed once every five times.
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
#In each epoch, train,Execute val.
#The power put in the dictionary is demonstrated here, and you can write train and val in one go.
for phase in ['train', 'val']:
if phase == 'train':
model.train() #Learning mode. Do dropout etc.
else:
model.val() #Inference mode. Do not drop out.
running_loss = 0.0
running_corrects = 0
for data in dataloaders[phase]:
inputs, labels = data #The data created by ImageFolder is
#It will label the data.
#Not required if you don't use GPU
if use_gpu:
inputs = inputs.cuda()
labels = labels.cuda()
#~~~~~~~~~~~~~~forward~~~~~~~~~~~~~~~
outputs = model(inputs)
_, preds = torch.max(outputs.data, 1)
#torch.max returns the actual value and index.
#torch.max((0.8, 0.1),1)=> (0.8, 0)
#Argument 1 is whether to return the maximum value in the row direction or the column direction.
loss = criterion(outputs, labels)
if phase == 'train':
optimizer.zero_grad()
loss.backward()
optimizer.step()
# statistics #Without GPU item()Unnecessary
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels)
# (preds == labels)Is[True, True, False]Etc., but
#python true,False is 1 each,Since it corresponds to 0,
#You can sum with sum.
#Store progress in list
loss_dict[phase].append(epoch_loss)
acc_dict[phase].append(epoch_acc)
#Divide by the number of samples to get the average.
#Putting the number of samples in the dictionary comes to life.
epoch_loss = running_loss / data_size[phase]
#Without GPU item()Unnecessary
epoch_acc = running_corrects.item() / data_size[phase]
#tensot().item()You can retrieve the value from the tensor by using.
#print(tensorA) => tensor(112, device='cuda:0')
#print(tensorA.itme)) => 112
#I use format,.With nf, you can output up to n digits after the decimal point.
#It's the same as C language.
print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
# deep copy the model
#Save the model when accuracy improves
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
#Without deepcopy, model.state_dict()Due to changes in the contents
#The copied (should) data will also change.
#The difference between copy and deepcopy is easy to understand in this article.
# https://www.headboost.jp/python-copy-deepcopy/
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val acc: {:.4f}'.format(best_acc))
#Reads and returns the best weight.
model.load_state_dict(best_model_wts)
return model, loss_dict, acc_dict
model_ft, loss, acc = train_model(model, criterion, optim, num_epochs=epoch)
#loss,Take out acc.
loss_train = loss["train"]
loss_val = loss["val"]
acc_train = acc["train"]
acc_val = acc["val"]
#By writing like this, you can create a graph of rows x cols.
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))
#0th graph
axes[0].plot(range(epoch), loss_train, label = "train")
axes[0].plot(range(epoch), loss_val, label = "val")
axes[0].set_title("Loss")
axes[0].legend()#Display label of each graph
#1st graph
axes[1].plot(range(epoch), acc_train, label = "train")
axes[1].plot(range(epoch), acc_val, label = "val")
axes[1].set_title("Train Loss")
axes[1].legend()
#Adjust so that the 0th and 1st graphs do not overlap
fig.tight_layout()
Are you overfitting at around 11 or 12 epochs?
GoogleColabolatory Collaboration is an easy way to use the GPU. https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja When using images in collaboration, it is convenient to zip and upload. (It is difficult to upload one by one.) (The method of linking with the drive is also OK) At that time, the decompression can be done as follows.
#/content/pics.Please change each zip.
!unzip /content/pics.zip -d /content/data > /dev/null 2>&1 &
Also, "Copy path" that appears when you right-click the file is convenient.
matplotlib.plt This time, I output a graph with 1 row and 2 columns, but if it is 2 rows and 2 columns, for example, you can create a graph as follows. You can also overwrite a plot on a graph and plot two at the same time. I plotted two graphs at a time.
loss_train = loss["train"]
loss_val = loss["val"]
acc_train = acc["train"]
acc_val = acc["val"]
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,5))
axes[0,0].plot(range(epoch), loss_train, label = "train")
axes[0,0].plot(range(epoch), loss_val, label = "val")
axes[0,0].set_title("Loss")
axes[0,0].legend()
axes[0,1].plot(range(epoch), acc_train, c="red", label = "train")
axes[0,1].plot(range(epoch), acc_val, c="pink", label = "val")
axes[0,1].set_title("Train Loss")
axes[0,1].legend()
x = np.random.rand(100)
xx = np.random.rand(200)
axes[1,0].hist(xx, bins=25, label="xx")
axes[1,0].hist(x, bins=50, label="x")
axes[1,0].set_title("histgram")
y = np.random.randn(100)
z = np.random.randn(100)
axes[1,1].scatter(y, z, alpha=0.8, label="y,z")
axes[1,1].scatter(z, y, alpha=0.8, label="z,y")
axes[1,1].set_title("Scatter")
axes[1,1].legend()
fig.tight_layout()