Since I want to use various data, I tried various ways to create my own dataset, so I will summarize it. It is an indispensable technology for performing denoising, coloring, domain conversion, etc.
This time, I will summarize the two elements. One is how to use various classes of torchvision.transforms and how to create your own class, and the other is how to create your own dataset using them. In the latter half, there are the following references, but since we have done a lot of trial and error, we will post the results.
【reference】 ① Explanation of transforms, Datasets, Dataloader of pyTorch and creation and use of self-made Dataset (2) I implemented reading Dataset with PyTorch ③TORCHVISION.TRANSFORMS
・ Organize transforms ・ Apply to autoencoder ・ How to make your own dataset ① In the case of data-label ② In the case of data1-data2-label
The transform appears (defines) in the constructor of pytorch-lighitning as shown below, data processing is easily defined in setup, and that processing is executed at the time of acquisition in Dataloader. In the following, transforms.Normalize ((0.1307,), (0.3081,)) is executed for MNIST data. At first, I would like to summarize this number from what.
class LitAutoEncoder(pl.LightningModule):
def __init__(self, data_dir='./'):
super().__init__()
self.data_dir = data_dir
# Hardcode some dataset specific attributes
self.num_classes = 10
self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
self.dims = (1, 28, 28)
channels, width, height = self.dims
self.transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 32))
self.decoder = nn.Sequential(nn.Linear(32, 128), nn.ReLU(), nn.Linear(128, 28 * 28))
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
...
def setup(self, stage=None): #train, val,test data split
# Assign train/val datasets for use in dataloaders
mnist_full =MNIST(self.data_dir, train=True, transform=self.transform)
n_train = int(len(mnist_full)*0.8)
n_val = len(mnist_full)-n_train
self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [n_train, n_val])
self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)
def train_dataloader(self):
self.trainloader = DataLoader(self.mnist_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
# get some random training images
return self.trainloader
...
And these transforms were summarized in Reference ③ above. I haven't tried everything here, but I tried to move the functions in the table below that I might use for the time being.
function | Remarks |
---|---|
rotate(x, angle) | Rotate based on angle |
to_grayscale(x) | Convert to grayscale |
vflip(x) | Flip up and down |
hflip(x) | Flip left and right |
Resize(imageSize) | Resize to the specified size |
Normalize(self.mean, self.std) | Normalize the image with the specified mean and standard deviation |
Compose() | ()Perform a series of transformations in |
ToTensor() | Convert to torch Tensor |
ToPILImage() | Convert to PILImage |
Compose(transforms)
CenterCrop(size)
ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
FiveCrop(size)
Grayscale(num_output_channels=1)
Pad(padding, fill=0, padding_mode='constant')
RandomAffine(degrees, translate=None, scale=None, shear=None, resample=0, fillcolor=0)
RandomApply(transforms, p=0.5)
RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
RandomGrayscale(p=0.1)
RandomHorizontalFlip(p=0.5)
RandomPerspective(distortion_scale=0.5, p=0.5, interpolation=2, fill=0)
RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
RandomSizedCrop(*args, **kwargs)
RandomVerticalFlip(p=0.5)
Resize(size, interpolation=2)
TenCrop(size, vertical_flip=False)
GaussianBlur(kernel_size, sigma=(0.1, 2.0))
Transforms on PIL Image only;
RandomChoice(transforms)
RandomOrder(transforms)
Transforms on torch.*Tensor only;
LinearTransformation(transformation_matrix, mean_vector)
Normalize(mean, std, inplace=False) output[channel] = (input[channel] - mean[channel]) / std[channel]
RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0, inplace=False)
ConvertImageDtype(dtype: torch.dtype)
Conversion Transforms;
ToPILImage(mode=None)
ToTensor
Generic Transforms;
Lambda(lambd)
Functional Transforms;
Example: you can apply a functional transform with the same parameters to multiple images like this:...
Example: you can use a functional transform to build transform classes with custom behavior:...
adjust_brightness(img: torch.Tensor, brightness_factor: float) → torch.Tensor
adjust_contrast(img: torch.Tensor, contrast_factor: float) → torch.Tensor
adjust_gamma(img: torch.Tensor, gamma: float, gain: float = 1) → torch.Tensor
adjust_hue(img: torch.Tensor, hue_factor: float) → torch.Tensor
adjust_saturation(img: torch.Tensor, saturation_factor: float) → torch.Tensor
...Omitted below
I will post the code as a bonus. For how to write a class, refer to Reference ④ below. In addition, the execution results of various transforms are posted in Reference ⑤. Furthermore, for how to put gaussian noize, refer to Reference ⑥, and the same code is also posted in Reference ⑦. It is described in Reference ⑤ that you can use your own transform function from transforms.Lambda (function name), but this time it is not used.
from PIL import ImageFilter
img = Image.open("sample.jpg ")
def blur(img):
"""Apply a Gaussian filter.
"""
return img.filter(ImageFilter.BLUR)
transform = transforms.Lambda(blur)
img = transform(img)
img
【reference】 ④vision/docs/source/transforms.rst ⑤ Pytorch – Transform summary that can be used with torchvision ⑥How to add noise to MNIST dataset when using pytorch Therefore, the following reference ⑦ can be easily executed as sample augmentation. ⑦Pytorch Image Augmentation using Transforms.
The code for pytorch-lightning is below. In the code below, the image is not resized, but it can be done by changing the Network.
class LitAutoEncoder(pl.LightningModule):
def __init__(self, data_dir='./'):
super().__init__()
self.data_dir = data_dir
# Hardcode some dataset specific attributes
self.num_classes = 10
self.classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
#self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
self.dims = (3, 32, 32)
self.mean = [0.5,0.5,0.5] #[0.485, 0.456, 0.406] #[0.5,0.5,0.5]
self.std = [0.25,0.25,0.25] #[0.229, 0.224, 0.225] #[0.5,0.5,0.5]
self.imageSize = (32,32)
self.p=0.5
self.scale=(0.01, 0.05) #(0.02, 0.33)
self.ratio=(0.3, 0.3) #(0.3, 3.3)
self.value=0
self.inplace=False
#channels, width, height = self.dims
self.transform = transforms.Compose([
transforms.Resize(self.imageSize), #Image resizing
transforms.ToTensor(),
transforms.Normalize(self.mean, self.std),
transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace),
MyAddGaussianNoise(0., 0.5)
])
self.encoder = Encoder()
self.decoder = Decoder()
def forward(self, x):
# in lightning, forward defines the prediction/inference actions
embedding = self.encoder(x)
return embedding
result Both are output at 1epock, but the output image is better with noise.
No processing | After applying transforms with the above compose |
---|---|
ToTensor(), Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) | Resize(self.imageSize), ToTensor(), Normalize(self.mean, self.std), RandomErasing(...), MyAddGaussianNoise(0., 0.5) |
input | input |
output | output |
In the above, if you want to download and use a dataset that is open to the public, you can simply put it in your own Dir with the following code and transform the data by reading it as follows. However, in the case of own data, it is carried out from the point where files and images are read according to the format.
cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)
def prepare_data(self):
# download
CIFAR10(self.data_dir, train=True, download=True)
CIFAR10(self.data_dir, train=False, download=True)
def setup(self, stage=None): #train, val,test data split
# Assign train/val datasets for use in dataloaders
cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)
n_train = int(len(cifar10_full)*0.8)
n_val = len(cifar10_full)-n_train
self.cifar10_train, self.cifar10_val = torch.utils.data.random_split(cifar10_full, [n_train, n_val])
self.cifar10_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
def train_dataloader(self):
self.trainloader = DataLoader(self.cifar10_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
# get some random training images
return self.trainloader
def val_dataloader(self):
return DataLoader(self.cifar10_val, shuffle=False, batch_size=32, num_workers=0)
def test_dataloader(self):
self.testloader = DataLoader(self.cifar10_test, shuffle=False, batch_size=32, num_workers=0)
return self.testloader
First of all, Basic as in Reference ② is important. In the previous learning of mediapipe, I created and used the following dataset. In the following, the data was read from the csv file, converted to coordinates, and provided out_data and its classification, out_label.
class HandsDataset(torch.utils.data.Dataset):
def __init__(self, data_num, transform=None):
self.transform = transform
self.data_num = data_num
self.data = []
self.label = []
df = pd.read_csv('./hands/sample_hands7.csv', sep=',')
print(df.head(3)) #Data confirmation
df = df.astype(int)
x = []
for j in range(self.data_num):
x_ = []
for i in range(0,21,1):
x__ = [df['{}'.format(2*i)][j],df['{}'.format(2*i+1)][j]]
x_.append(x__)
x.append(x_)
y = df['42'][:self.data_num]
#The following float()And long()The designation of is the liver of this time
self.data = torch.from_numpy(np.array(x)).float()
print(self.data)
self.label = torch.from_numpy(np.array(y)).long()
print(self.label)
def __len__(self):
return self.data_num
def __getitem__(self, idx):
out_data = self.data[idx]
out_label = self.label[idx]
if self.transform:
out_data = self.transform(out_data)
return out_data, out_label
This time, we will show the case of providing your own image data as your own dataset. The results are as follows.
class ImageDataset(torch.utils.data.Dataset):
def __init__(self, data_num, transform=None):
self.transform = transform
self.data_num = data_num
self.data = []
self.label = []
x = []
y = []
from_dir = './face/mayuyu/'
sk = 0
for path in glob.glob(os.path.join(from_dir, '*.jpg')):
image = Image.open(path)
x.append(np.array(image)/255.)
y.append(sk)
sk += 1
self.data = torch.from_numpy(np.array(x)).float()
self.label = torch.from_numpy(np.array(y)).long()
def __len__(self):
return self.data_num
def __getitem__(self, idx):
out_data = self.data[idx]
out_label = self.label[idx]
if self.transform:
out_data = self.transform(out_data)
return out_data, out_label
mean, std = [0.5,0.5,0.5], [0.25,0.25,0.25]
model = ImageDataset(10, transform = transforms.Normalize(mean, std))
for i in range(10):
image = model.data[i]
print(model.label[i], image)
plt.title('label_{}'.format(model.label[i]))
plt.imshow(image)
plt.pause(1)
plt.close()
dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=0)
import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import cv2
import matplotlib.pyplot as plt
from torchvision.datasets import CIFAR10
from PIL import Image
class ImageDataset(torch.utils.data.Dataset):
def __init__(self, data_num, transform1 = None, transform2 = None,train = True):
self.transform1 = transform1
self.transform2 = transform2
self.ts = torchvision.transforms.ToPILImage()
self.ts2 = transform=transforms.ToTensor()
self.data_dir = './'
self.data_num = data_num
self.data = []
self.label = []
# download
CIFAR10(self.data_dir, train=True, download=True)
CIFAR10(self.data_dir, train=False, download=True)
self.data =CIFAR10(self.data_dir, train=True, transform=self.ts2)
def __len__(self):
return self.data_num
def __getitem__(self, idx):
out_data = self.ts(self.data[idx][0])
out_label = np.array(self.data[idx][1])
if self.transform1:
out_data1 = self.transform1(out_data)
if self.transform2:
out_data2 = self.transform2(out_data)
return out_data1, out_data2, out_label
trans1 = torchvision.transforms.ToTensor()
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])
dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4,
shuffle=True, num_workers=0)
ts = torchvision.transforms.ToPILImage()
for out_data1, out_data2, out_label in testloader:
print(len(out_label),out_label)
for i in range(len(out_label)):
image = out_data1[i]
image_gray = out_data2[i]
im = ts(image)
im_gray = ts(image_gray)
#print(out_label[i])
plt.imshow(np.array(im_gray), cmap='gray')
plt.title('{}'.format(out_label[i]))
plt.pause(1)
plt.clf()
plt.imshow(np.array(im))
plt.title('{}'.format(out_label[i]))
plt.pause(1)
plt.clf()
plt.close()
>python dataset_cifar10_original.py
Files already downloaded and verified
Files already downloaded and verified
4 tensor([0, 3, 2, 6], dtype=torch.int32)
tensor(0, dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(6, dtype=torch.int32)
4 tensor([2, 2, 9, 5], dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(5, dtype=torch.int32)
4 tensor([3, 6, 1, 7], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(7, dtype=torch.int32)
4 tensor([3, 9, 4, 9], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(9, dtype=torch.int32)
4 tensor([7, 8, 4, 4], dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(8, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(4, dtype=torch.int32)
4 tensor([6, 7, 9, 0], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(0, dtype=torch.int32)
4 tensor([4, 1, 9, 2], dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(2, dtype=torch.int32)
4 tensor([6, 9, 6, 3], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(3, dtype=torch.int32)
・ I want to use this to create new learning and usage apps for denoizing, coloring, image enlargement, image composition, etc.
import torchvision.transforms.functional as TF
import random
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import numpy as np
import torch
import torchvision
class MyRotationTransform:MyRotationTransform
"""Rotate by one of the given angles."""
def __init__(self, angles):
self.angles = angles
def __call__(self, x):
angle = random.choice(self.angles)
return TF.rotate(x, angle)
class MyGrayscaleTransform:
"""GrayScale by this class."""
def __init__(self):
pass
def __call__(self, x):
#return TF.rgb_to_grayscale(x)
return TF.to_grayscale(x)
class MyVflipTransform:
"""Vertical flip by this class."""
def __init__(self):
pass
def __call__(self, x):
return TF.vflip(x)
class MyHflipTransform:
"""Vertical flip by this class."""
def __init__(self):
pass
def __call__(self, x):
return TF.hflip(x)
from torchvision import transforms
class MyNormalizeTransform:
"""normalization by the image."""
def __init__(self):
self.imageSize = (512,512)
self.mean = [0.485, 0.456, 0.406]
self.std = [0.229, 0.224, 0.225]
def __call__(self, x):
img = self.transform = transforms.Compose([
transforms.Resize(self.imageSize), #Image resizing
transforms.ToTensor(), #Tensorization
transforms.Normalize(self.mean, self.std), #Standardization
])
return img(x)
class MyErasingTransform:
"""normalization by the image."""
def __init__(self):
self.imageSize = (512,512)
self.p=0.5
self.scale=(0.02, 0.33)
self.ratio=(0.3, 3.3)
self.value=0
self.inplace=False
def __call__(self, x):
self.transform = transforms.Compose([
transforms.Resize(self.imageSize), #Image resizing
transforms.ToTensor(), #Tensorization
transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace)
])
return self.transform(x)
class MyAddGaussianNoise(object):
def __init__(self, mean=0., std=0.1):
self.std = std
self.mean = mean
def __call__(self, tensor):
return tensor + torch.randn(tensor.size()) * self.std + self.mean
def __repr__(self):
return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])
ts = torchvision.transforms.ToPILImage()
trans3 = MyGrayscaleTransform()
trans4 = MyHflipTransform()
trans5 = MyNormalizeTransform()
trans6 = MyErasingTransform()
trans7 = transforms.Compose([
transforms.ToTensor(),
#transforms.Normalize((0.1307,), (0.3081,)),
MyAddGaussianNoise(0., 0.1)
])
angle_list =[i for i in range(-10,10,1)] #[-30, -15, 0, 15, 30]
rotation_transform = MyRotationTransform(angles=angle_list)
x = Image.open('./face/mayuyu/2.jpg')
while 1:
y = rotation_transform(x)
#z = trans5(x)
z = trans7(y)
plt.imshow(ts(z))
plt.pause(0.1)
#z = trans3(x)
#plt.imshow(z, cmap='gray')
#plt.pause(0.1)
#plt.imshow(np.array(ts(trans2(y))), cmap='gray')
#plt.pause(0.1)
plt.clf()
Recommended Posts