[Introduction to pytorch-lightning] How to use torchvision.transforms and how to freely create your own dataset ♬

Since I want to use various data, I tried various ways to create my own dataset, so I will summarize it. It is an indispensable technology for performing denoising, coloring, domain conversion, etc.

This time, I will summarize the two elements. One is how to use various classes of torchvision.transforms and how to create your own class, and the other is how to create your own dataset using them. In the latter half, there are the following references, but since we have done a lot of trial and error, we will post the results.

【reference】 ① Explanation of transforms, Datasets, Dataloader of pyTorch and creation and use of self-made Dataset (2) I implemented reading Dataset with PyTorchTORCHVISION.TRANSFORMS

What i did

・ Organize transforms ・ Apply to autoencoder ・ How to make your own dataset ① In the case of data-label ② In the case of data1-data2-label

・ Organize transforms

The transform appears (defines) in the constructor of pytorch-lighitning as shown below, data processing is easily defined in setup, and that processing is executed at the time of acquisition in Dataloader. In the following, transforms.Normalize ((0.1307,), (0.3081,)) is executed for MNIST data. At first, I would like to summarize this number from what.

class LitAutoEncoder(pl.LightningModule):

    def __init__(self, data_dir='./'):
        super().__init__()
        self.data_dir = data_dir
        
        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
        self.dims = (1, 28, 28)
        channels, width, height = self.dims
        self.transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,))])
        
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 32))
        self.decoder = nn.Sequential(nn.Linear(32, 128), nn.ReLU(), nn.Linear(128, 28 * 28))

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding
...
    def setup(self, stage=None): #train, val,test data split
        # Assign train/val datasets for use in dataloaders
        mnist_full =MNIST(self.data_dir, train=True, transform=self.transform)
        n_train = int(len(mnist_full)*0.8)
        n_val = len(mnist_full)-n_train
        self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [n_train, n_val])
        self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform)

    def train_dataloader(self):
        self.trainloader = DataLoader(self.mnist_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
        # get some random training images
        return self.trainloader
...

And these transforms were summarized in Reference ③ above. I haven't tried everything here, but I tried to move the functions in the table below that I might use for the time being.

function Remarks
rotate(x, angle) Rotate based on angle
to_grayscale(x) Convert to grayscale
vflip(x) Flip up and down
hflip(x) Flip left and right
Resize(imageSize) Resize to the specified size
Normalize(self.mean, self.std) Normalize the image with the specified mean and standard deviation
Compose() ()Perform a series of transformations in
ToTensor() Convert to torch Tensor
ToPILImage() Convert to PILImage
TORCHVISION.TRANSFORMS class etc.
Compose(transforms)
CenterCrop(size)
ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
FiveCrop(size)
Grayscale(num_output_channels=1)
Pad(padding, fill=0, padding_mode='constant')
RandomAffine(degrees, translate=None, scale=None, shear=None, resample=0, fillcolor=0)
RandomApply(transforms, p=0.5)
RandomCrop(size, padding=None, pad_if_needed=False, fill=0, padding_mode='constant')
RandomGrayscale(p=0.1)
RandomHorizontalFlip(p=0.5)
RandomPerspective(distortion_scale=0.5, p=0.5, interpolation=2, fill=0)
RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
RandomRotation(degrees, resample=False, expand=False, center=None, fill=None)
RandomSizedCrop(*args, **kwargs)
RandomVerticalFlip(p=0.5)
Resize(size, interpolation=2)
TenCrop(size, vertical_flip=False)
GaussianBlur(kernel_size, sigma=(0.1, 2.0))

Transforms on PIL Image only;
RandomChoice(transforms)
RandomOrder(transforms)

Transforms on torch.*Tensor only;
LinearTransformation(transformation_matrix, mean_vector)
Normalize(mean, std, inplace=False) output[channel] = (input[channel] - mean[channel]) / std[channel]
RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0, inplace=False)
ConvertImageDtype(dtype: torch.dtype)

Conversion Transforms;
ToPILImage(mode=None)
ToTensor

Generic Transforms;
Lambda(lambd)

Functional Transforms;
Example: you can apply a functional transform with the same parameters to multiple images like this:...
Example: you can use a functional transform to build transform classes with custom behavior:...
adjust_brightness(img: torch.Tensor, brightness_factor: float) → torch.Tensor
adjust_contrast(img: torch.Tensor, contrast_factor: float) → torch.Tensor
adjust_gamma(img: torch.Tensor, gamma: float, gain: float = 1) → torch.Tensor
adjust_hue(img: torch.Tensor, hue_factor: float) → torch.Tensor
adjust_saturation(img: torch.Tensor, saturation_factor: float) → torch.Tensor
...Omitted below

I will post the code as a bonus. For how to write a class, refer to Reference ④ below. In addition, the execution results of various transforms are posted in Reference ⑤. Furthermore, for how to put gaussian noize, refer to Reference ⑥, and the same code is also posted in Reference ⑦. It is described in Reference ⑤ that you can use your own transform function from transforms.Lambda (function name), but this time it is not used.

from PIL import ImageFilter
img = Image.open("sample.jpg ")

def blur(img):
    """Apply a Gaussian filter.
    """
    return img.filter(ImageFilter.BLUR)
transform = transforms.Lambda(blur)
img = transform(img)
img

【reference】 ④vision/docs/source/transforms.rstPytorch – Transform summary that can be used with torchvisionHow to add noise to MNIST dataset when using pytorch Therefore, the following reference ⑦ can be easily executed as sample augmentation. ⑦Pytorch Image Augmentation using Transforms.

・ Apply to autoencoder

The code for pytorch-lightning is below. In the code below, the image is not resized, but it can be done by changing the Network.

Application to autoencoder
class LitAutoEncoder(pl.LightningModule):

    def __init__(self, data_dir='./'):
        super().__init__()
        self.data_dir = data_dir
        
        # Hardcode some dataset specific attributes
        self.num_classes = 10
        self.classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
        #self.classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
        self.dims = (3, 32, 32)
        self.mean = [0.5,0.5,0.5] #[0.485, 0.456, 0.406] #[0.5,0.5,0.5]
        self.std  = [0.25,0.25,0.25] #[0.229, 0.224, 0.225] #[0.5,0.5,0.5]
        self.imageSize = (32,32)
        self.p=0.5
        self.scale=(0.01, 0.05) #(0.02, 0.33)
        self.ratio=(0.3, 0.3) #(0.3, 3.3)
        self.value=0
        self.inplace=False
        #channels, width, height = self.dims
        self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(),
            transforms.Normalize(self.mean, self.std),
            transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace),
            MyAddGaussianNoise(0., 0.5)
        ])
        self.encoder = Encoder()
        self.decoder = Decoder()

    def forward(self, x):
        # in lightning, forward defines the prediction/inference actions
        embedding = self.encoder(x)
        return embedding

result Both are output at 1epock, but the output image is better with noise.

No processing After applying transforms with the above compose
ToTensor(), Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) Resize(self.imageSize), ToTensor(), Normalize(self.mean, self.std), RandomErasing(...), MyAddGaussianNoise(0., 0.5)
inputoriginal_images_cifar10_32_1.png inputoriginal_images_cifar10_32_3.png
outputoriginal_autoencode_preds_cifar10_32_1_original.png outputoriginal_autoencode_preds_cifar10_32_1.png

・ How to make your own dataset

In the above, if you want to download and use a dataset that is open to the public, you can simply put it in your own Dir with the following code and transform the data by reading it as follows. However, in the case of own data, it is carried out from the point where files and images are read according to the format. cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)

Normal dataset, Dataloader usage code
    def prepare_data(self):
        # download
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)

    def setup(self, stage=None): #train, val,test data split
        # Assign train/val datasets for use in dataloaders
        cifar10_full =CIFAR10(self.data_dir, train=True, transform=self.transform)
        n_train = int(len(cifar10_full)*0.8)
        n_val = len(cifar10_full)-n_train
        self.cifar10_train, self.cifar10_val = torch.utils.data.random_split(cifar10_full, [n_train, n_val])
        self.cifar10_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
    
    def train_dataloader(self):
        self.trainloader = DataLoader(self.cifar10_train, shuffle=True, drop_last = True, batch_size=32, num_workers=0)
        # get some random training images
        return self.trainloader
    
    def val_dataloader(self):
        return DataLoader(self.cifar10_val, shuffle=False, batch_size=32, num_workers=0)
    
    def test_dataloader(self):
        self.testloader = DataLoader(self.cifar10_test, shuffle=False, batch_size=32, num_workers=0)
        return self.testloader

① In the case of data-label

First of all, Basic as in Reference ② is important. In the previous learning of mediapipe, I created and used the following dataset. In the following, the data was read from the csv file, converted to coordinates, and provided out_data and its classification, out_label.

dataset code for previous mediapipe_hands data
class HandsDataset(torch.utils.data.Dataset):
    def __init__(self, data_num, transform=None):
        self.transform = transform
        self.data_num = data_num
        self.data = []
        self.label = []
        df = pd.read_csv('./hands/sample_hands7.csv', sep=',')
        print(df.head(3)) #Data confirmation
        df = df.astype(int)
        x = []
        for j in range(self.data_num):
            x_ = []
            for i in range(0,21,1):
                x__ = [df['{}'.format(2*i)][j],df['{}'.format(2*i+1)][j]]
                x_.append(x__)
            x.append(x_)
        y = df['42'][:self.data_num]

        #The following float()And long()The designation of is the liver of this time
        self.data = torch.from_numpy(np.array(x)).float()
        print(self.data)
        self.label = torch.from_numpy(np.array(y)).long()
        print(self.label)

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)
        return out_data, out_label

This time, we will show the case of providing your own image data as your own dataset. The results are as follows.

dataset code for your own image data
class ImageDataset(torch.utils.data.Dataset):

    def __init__(self, data_num, transform=None):
        self.transform = transform
        self.data_num = data_num
        self.data = []
        self.label = []
        x = []
        y = []
        from_dir = './face/mayuyu/'
        sk = 0
        for path in glob.glob(os.path.join(from_dir, '*.jpg')):    
            image = Image.open(path)
            x.append(np.array(image)/255.)
            y.append(sk)
            sk += 1
        
        self.data = torch.from_numpy(np.array(x)).float()
        self.label = torch.from_numpy(np.array(y)).long()

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)
        return out_data, out_label

mean, std = [0.5,0.5,0.5], [0.25,0.25,0.25]
model = ImageDataset(10, transform = transforms.Normalize(mean, std))
for i in range(10):
    image =  model.data[i]
    print(model.label[i], image)
    plt.title('label_{}'.format(model.label[i]))
    plt.imshow(image)
    plt.pause(1)
    plt.close()
#### ② In the case of data1-data2-label This code downloads the so-called cifar10 dataset, converts it to gray, and provides the dataset at the same time as the original color image. At this time, the original label naturally has a necessary scene, so it is also provided at the same time. Basically, the code of the dataset that outputs the gray image as out_data and the color image as out_label is shown in Reference (1) above, but the original label is also output here at the same time. Also, I tried to make the code as easy to understand as possible. It can be used with the same code below as the number of data, and the number of data according to the number of batches for the data group processed according to trans1 and trans2. In other words, as shown in the execution result, in the case of the following code, 32 pieces of data will be generated, and 4 pieces will be taken out and used.
dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=0)
Cifar10 Data processed data and unprocessed data, and dataset code for providing label
import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
import cv2
import matplotlib.pyplot as plt
from torchvision.datasets import CIFAR10
from PIL import Image

class ImageDataset(torch.utils.data.Dataset):

    def __init__(self, data_num, transform1 = None, transform2 = None,train = True):

        self.transform1 = transform1
        self.transform2 = transform2
        self.ts = torchvision.transforms.ToPILImage()
        self.ts2 = transform=transforms.ToTensor()
        
        self.data_dir = './'
        self.data_num = data_num
        self.data = []
        self.label = []

        # download
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)
        self.data =CIFAR10(self.data_dir, train=True, transform=self.ts2)

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.ts(self.data[idx][0])
        out_label =  np.array(self.data[idx][1])
        if self.transform1:
            out_data1 = self.transform1(out_data)
        if self.transform2:
            out_data2 = self.transform2(out_data)
        return out_data1, out_data2, out_label

trans1 = torchvision.transforms.ToTensor()
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])

dataset = ImageDataset(32,transform1 = trans1, transform2 = trans2)
testloader = DataLoader(dataset, batch_size=4,
                            shuffle=True, num_workers=0)

ts = torchvision.transforms.ToPILImage()

for out_data1, out_data2, out_label in testloader:
    print(len(out_label),out_label)
    for i in range(len(out_label)):
        image =  out_data1[i]
        image_gray = out_data2[i]
        im = ts(image)
        im_gray = ts(image_gray)
        #print(out_label[i])
        plt.imshow(np.array(im_gray),  cmap='gray')
        plt.title('{}'.format(out_label[i]))
        plt.pause(1)
        plt.clf()
        plt.imshow(np.array(im))
        plt.title('{}'.format(out_label[i]))
        plt.pause(1)
        plt.clf()
plt.close() 
Execution result is as follows
>python dataset_cifar10_original.py
Files already downloaded and verified
Files already downloaded and verified
4 tensor([0, 3, 2, 6], dtype=torch.int32)
tensor(0, dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(6, dtype=torch.int32)
4 tensor([2, 2, 9, 5], dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(2, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(5, dtype=torch.int32)
4 tensor([3, 6, 1, 7], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(7, dtype=torch.int32)
4 tensor([3, 9, 4, 9], dtype=torch.int32)
tensor(3, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(9, dtype=torch.int32)
4 tensor([7, 8, 4, 4], dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(8, dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(4, dtype=torch.int32)
4 tensor([6, 7, 9, 0], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(7, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(0, dtype=torch.int32)
4 tensor([4, 1, 9, 2], dtype=torch.int32)
tensor(4, dtype=torch.int32)
tensor(1, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(2, dtype=torch.int32)
4 tensor([6, 9, 6, 3], dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(9, dtype=torch.int32)
tensor(6, dtype=torch.int32)
tensor(3, dtype=torch.int32)
### Summary ・ I played with transforms ・ I made my own dataset and played with it -You can now create your own dataset using your own data. ・ I learned how to use a dataset and its Dataloader that can perform various processes and simultaneously provide various datasets obtained as a result.

・ I want to use this to create new learning and usage apps for denoizing, coloring, image enlargement, image composition, etc.

bonus

import torchvision.transforms.functional as TF
import random
import matplotlib.pyplot as plt
import cv2
from PIL import Image
import numpy as np
import torch
import torchvision

class MyRotationTransform:MyRotationTransform
    """Rotate by one of the given angles."""

    def __init__(self, angles):
        self.angles = angles

    def __call__(self, x):
        angle = random.choice(self.angles)
        return TF.rotate(x, angle)
    
class MyGrayscaleTransform:
    """GrayScale by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        #return TF.rgb_to_grayscale(x)
        return TF.to_grayscale(x)
    
class MyVflipTransform:
    """Vertical flip by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        return TF.vflip(x)    

class MyHflipTransform:
    """Vertical flip by this class."""

    def __init__(self):
        pass

    def __call__(self, x):
        return TF.hflip(x)   

from torchvision import transforms    
class MyNormalizeTransform:
    """normalization by the image."""

    def __init__(self):
        self.imageSize = (512,512)
        self.mean = [0.485, 0.456, 0.406]
        self.std  = [0.229, 0.224, 0.225]
        
    def __call__(self, x):
        img = self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(), #Tensorization
            transforms.Normalize(self.mean, self.std), #Standardization
        ])
        return img(x) 
    
class MyErasingTransform:
    """normalization by the image."""

    def __init__(self):
        self.imageSize = (512,512)
        self.p=0.5
        self.scale=(0.02, 0.33)
        self.ratio=(0.3, 3.3)
        self.value=0
        self.inplace=False
        
    def __call__(self, x):
        self.transform = transforms.Compose([
            transforms.Resize(self.imageSize), #Image resizing
            transforms.ToTensor(), #Tensorization
            transforms.RandomErasing(p=self.p, scale=self.scale, ratio=self.ratio, value=self.value, inplace=self.inplace)
        ])
        return self.transform(x)     

class MyAddGaussianNoise(object):
    def __init__(self, mean=0., std=0.1):
        self.std = std
        self.mean = mean
        
    def __call__(self, tensor):
        return tensor + torch.randn(tensor.size()) * self.std + self.mean
    
    def __repr__(self):
        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)  
    
trans2 = torchvision.transforms.Compose([torchvision.transforms.Grayscale(), torchvision.transforms.ToTensor()])
ts = torchvision.transforms.ToPILImage()

trans3 = MyGrayscaleTransform()
trans4 = MyHflipTransform()
trans5 = MyNormalizeTransform()
trans6 = MyErasingTransform()
trans7 = transforms.Compose([
        transforms.ToTensor(),
        #transforms.Normalize((0.1307,), (0.3081,)),
        MyAddGaussianNoise(0., 0.1)
        ])

angle_list =[i for i in range(-10,10,1)] #[-30, -15, 0, 15, 30]
rotation_transform = MyRotationTransform(angles=angle_list)

x = Image.open('./face/mayuyu/2.jpg')
while 1:
    y = rotation_transform(x)
    #z = trans5(x)
    z = trans7(y)
    plt.imshow(ts(z))
    plt.pause(0.1)
    #z = trans3(x)
    #plt.imshow(z,  cmap='gray')
    #plt.pause(0.1)
    #plt.imshow(np.array(ts(trans2(y))),  cmap='gray')
    #plt.pause(0.1)
    plt.clf()

Recommended Posts

[Introduction to pytorch-lightning] How to use torchvision.transforms and how to freely create your own dataset ♬
How to create your own Transform
How to use pyenv and pyenv-virtualenv in your own way
Introduction to how to use Pytorch Lightning ~ Until you format your own model and output it to tensorboard ~
[Introduction] How to use open3d
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
Introduction of DataLiner ver.1.3 and how to use Union Append
Reinforcement learning 23 Create and use your own module with Colaboratory
How to install and use Tesseract-OCR
How to use .bash_profile and .bashrc
How to install and use Graphviz
[Introduction to Python] How to use the Boolean operator (and ・ or ・ not)
[Introduction to Python] How to use class in Python?
How to install and use pandas_datareader [Python]
How to install your own (root) CA
python: How to use locals () and globals ()
How to use Python zip and enumerate
How to use is and == in Python
How to use pandas Timestamp and date_range
Introduction of cyber security framework "MITRE CALDERA": How to use and training
How to use lists, tuples, dictionaries, and sets
Introducing Sinatra-style frameworks and how to use them
[Introduction to Udemy Python3 + Application] 23. How to use tuples
How to create explanatory variables and objective functions
How to use "Jupyter Book" to create publication quality books and documents from .ipynb
Extend and inflate your own Deep Learning dataset
[For recording] Keras image system Part 1: How to create your own data set?
[Python] How to use hash function and tuple.
How to install Cascade detector and how to use it
I tried to understand how to use Pandas and multicollinearity based on the Affairs dataset.
[Introduction to Azure for kaggle users] Comparison of how to start and use Azure Notebooks and Azure Notebooks VM
How to use Docker to containerize your application and how to use Docker Compose to run your application in a development environment
Create a dataset of images to use for learning
[Introduction to Python] How to use while statements (repetitive processing)
[Python] [Django] How to use ChoiceField and how to add options
[Introduction to Udemy Python 3 + Application] 66. Creating your own exceptions
Beginners! Basic Linux commands and how to use them!
How to use the grep command and frequent samples
Julia Quick Note [01] How to use variables and constants
[Python] How to create Correlation Matrix and Heat Map
[Introduction to Udemy Python3 + Application] 27. How to use the dictionary
How to use argparse and the difference between optparse
How to use Decorator in Django and how to make it
How to set up and compile your Cython environment
How to use xml.etree.ElementTree
Create your own exception
How to use Python-shell
How to use tf.data
How to use virtualenv
How to use image-match
How to use shogun
How to use Pandas 2
How to use Virtualenv
How to use numpy.vectorize
How to use pytest_report_header
How to use partial
How to use Bio.Phylo
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython