I tried to generate a junk character MNIST using cGAN (conditional GAN), which is a kind of GAN. For detailed theoretical aspects, please refer to the links that will be helpful as appropriate.
--I want to implement GAN with PyTorch ――I want to create a model that can generate the desired image
I hope it will be helpful for those who like.
About the dataset used this time. KMNIST is a data set created for machine learning as a derivative of the "Japanese Classics Kuzuji Data Set" created by the Humanities Open Data Sharing Center. You can download it from GitHub Link.
"KMNIST Dataset" (created by CODH) "Japanese Classics Kuzuji Dataset" (Kokubunken et al.) Adapted doi: 10.20676 / 00000341
Like that MNIST (handwritten number) that anyone who has done machine learning knows, one image is 1 x 28px x 28px in size.
The following three types of datasets can be downloaded from the repository in the compressed format of numpy.array.
--kuzushiji-MNIST (10 hiragana characters) --kuzushiji-49 (49 characters of hiragana) --kuzushiji-kanji (3832 kanji characters)
Of these, "kuzushiji-49" will be used this time. There is no particular deep reason, but if you can generate 49 hiragana characters, can you generate handwritten text? I thought it was a light motivation.
Let's briefly touch on GAN before cGAN. GAN is an abbreviation of "Generative Adversarial Network", which is a kind of deep learning generative model. It is especially effective in the field of image generation, and I think that the result of generating facial images of people who do not exist in the world is famous.
The following is a rough model diagram of GAN. "G" stands for Generator and "D" stands for Discriminator.
Generator will generate a fake image that is as close to real as possible from the noise. Discriminator discriminates between the real image (real_img) taken from the dataset and the fake image (fake_img) created by the Generator (True or False).
By repeating this learning, the Generator tries to create an image that is as close to the real thing as possible that the Discriminator cannot detect, and the Discriminator tries to detect the fake created by the Generator and the real thing derived from the data set. The accuracy will increase.
Reference article
GAN (1) Understanding the basic structure that I can't hear anymore
GAN-related papers are organized in This GitHub Repository.
Next, I would like to talk about the conditional GAN used this time. Simply put, it is ** "GAN that can generate the desired image" **. The idea is simple, it's like deciding the image to generate by adding label information to the input of discriminator and Generator.
The original paper is here
It is the same as a normal GAN except that ** "Enter the label during training" **. Also, although label information is used, Discriminator only determines "whether the image is genuine".
Reference article
Implementation of GAN (6) Conditional GAN that I can't hear anymore
Now let's get into the implementation.
I have jupyterlab installed and running on Ubuntu 18.04.
Import required module
python
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import time
import random
Download the numpy format data from KMNIST's github.
With jupyterlab, after opening Terminal and moving the repository
wget http://codh.rois.ac.jp/kmnist/dataset/k49/k49-train-imgs.npz
wget http://codh.rois.ac.jp/kmnist/dataset/k49/k49-train-labels.npz
Then you can download the image and label.
By the way, if it is KMNIST with 10 characters of hiragana, it is included in torchvision by default. If that's okay, just like normal MNIST
python
transform = transforms.Compose(
[transforms.ToTensor(),
])
train_data_10 = torchvision.datasets.KMNIST(root='./data', train=True,download=True,transform=transform)
You can use it if you do.
If you want to create your own customized dataset in PyTorch, you need to define the preprocessing yourself. Image-based pre-processing is mostly contained in torchvision.transforms
, so I often use this, but you can also create your own.
python
class Transform(object):
def __init__(self):
pass
def __call__(self, sample):
sample = np.array(sample, dtype = np.float32)
sample = torch.tensor(sample)
return (sample/127.5)-1
transform = Transform()
Most of the decimal numbers handled by numpy are np.float64
(floating point number 64bit), but PyTorch handles decimal values with floating point number 32bit by default, so an error will occur if they are not aligned.
In addition, the processing to normalize the brightness value of the image to the range of [-1,1] is performed here. This is because Tanh
is used in the final layer of the Generator output that will come out later, so the brightness value of the real image will be adjusted accordingly.
Next, we will define the Dataset class.
This is a module that returns a set of data and labels, and returns the data preprocessed by the transform
defined earlier when retrieving the data.
python
from tqdm import tqdm
class dataset_full(torch.utils.data.Dataset):
def __init__(self, img, label, transform=None):
self.transform = transform
self.data_num = len(img)
self.data = []
self.label = []
for i in tqdm(range(self.data_num)):
self.data.append([img[i]])
self.label.append(label[i])
self.data_num = len(self.data)
def __len__(self):
return self.data_num
def __getitem__(self, idx):
out_data = self.data[idx]
out_label = np.identity(49)[self.label[idx]]
out_label = np.array(out_label, dtype = np.float32)
if self.transform:
out_data = self.transform(out_data)
return out_data, out_label
If you put the first tqdm
, the progress will be displayed like a bar graph when you turn the for statement, but it has nothing to do with the cGAN itself.
We are using np.identity
to create a one-hot vector of length 49.
Create a Dataset using the Transform
, Dataset
classes implemented from the data downloaded earlier.
python
path = %pwd
train_img = np.load('{}/k49-train-imgs.npz'.format(path))
train_img = train_img['arr_0']
train_label = np.load('{}/k49-train-labels.npz'.format(path))
train_label = train_label['arr_0']
train_data = dataset_full(train_img, train_label, transform=transform)
If you put in the tqdm earlier, the progress will be displayed when you execute this. Most of the data is 232,625, but I don't think it will take long.
I have a dataset, but I don't get any data directly from this dataset when training my model. Since we are training batch by batch, we will define a DataLoader that will return batch size data.
python
batch_size = 256
train_loader = torch.utils.data.DataLoader(train_data, batch_size = batch_size, shuffle = True, num_workers=2)
If you set shuffle = True
, the data fetched from DataLoader will be random. num_workers
is an argument that specifies the number of cpu cores used by DataLoader, and is not particularly relevant to the cGAN itself.
The Transform-Dataset-DataLoader so far are summarized in the following articles.
Reference article
Check the basic operation of PyTorch transforms / Dataset / DataLoader
I will make the model body. Generator creates a fake image (fake_img) from noise and labels.
The implementation method is quite different depending on the person, but the structure of the Generator created this time is as follows.
(It's handwritten, but I'm sorry ...)
In the input, z_dim
(noise dimension) is 30, and num_class
(number of classes) is 49 hiragana characters, so it is set to 49. The fake image of the output has the shape of 1 (channel) x 28 (px) x 28 (px).
python
class Generator(nn.Module):
def __init__(self, z_dim, num_class):
super(Generator, self).__init__()
self.fc1 = nn.Linear(z_dim, 300)
self.bn1 = nn.BatchNorm1d(300)
self.LReLU1 = nn.LeakyReLU(0.2)
self.fc2 = nn.Linear(num_class, 1500)
self.bn2 = nn.BatchNorm1d(1500)
self.LReLU2 = nn.LeakyReLU(0.2)
self.fc3 = nn.Linear(1800, 128 * 7 * 7)
self.bn3 = nn.BatchNorm1d(128 * 7 * 7)
self.bo1 = nn.Dropout(p=0.5)
self.LReLU3 = nn.LeakyReLU(0.2)
self.deconv = nn.Sequential(
nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1), #Change the number of channels from 128 to 64.
nn.BatchNorm2d(64),
nn.LeakyReLU(0.2),
nn.ConvTranspose2d(64, 1, kernel_size=4, stride=2, padding=1), #Changed the number of channels from 64 to 1
nn.Tanh(),
)
self.init_weights()
def init_weights(self):
for module in self.modules():
if isinstance(module, nn.ConvTranspose2d):
module.weight.data.normal_(0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.Linear):
module.weight.data.normal_(0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.BatchNorm1d):
module.weight.data.normal_(1.0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.BatchNorm2d):
module.weight.data.normal_(1.0, 0.02)
module.bias.data.zero_()
def forward(self, noise, labels):
y_1 = self.fc1(noise)
y_1 = self.bn1(y_1)
y_1 = self.LReLU1(y_1)
y_2 = self.fc2(labels)
y_2 = self.bn2(y_2)
y_2 = self.LReLU2(y_2)
x = torch.cat([y_1, y_2], 1)
x = self.fc3(x)
x = self.bo1(x)
x = self.LReLU3(x)
x = x.view(-1, 128, 7, 7)
x = self.deconv(x)
return x
Next is Discriminator. Discriminator inputs the genuine / fake image and its label information and determines whether it is genuine or fake.
The structure of the Discriminator created this time is as follows.
ʻImg(input image) is 1 (channel) x 28 (px) x 28 (px) for both genuine and fake, and
labels` (input label) is a 49-dimensional one-hot vector. The output determines whether it is genuine or not with a value from 0 to 1.
Concat the image and label information in the channel direction with cat
in the middle. I think the cGAN article I mentioned earlier is easy to understand about this area.
python
class Discriminator(nn.Module):
def __init__(self, num_class):
super(Discriminator, self).__init__()
self.num_class = num_class
self.conv = nn.Sequential(
nn.Conv2d(num_class + 1, 64, kernel_size=4, stride=2, padding=1),
nn.LeakyReLU(0.2),
nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1),
nn.LeakyReLU(0.2),
nn.BatchNorm2d(128),
)
self.fc = nn.Sequential(
nn.Linear(128 * 7 * 7, 1024),
nn.BatchNorm1d(1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, 1),
nn.Sigmoid(),
)
self.init_weights()
def init_weights(self):
for module in self.modules():
if isinstance(module, nn.Conv2d):
module.weight.data.normal_(0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.Linear):
module.weight.data.normal_(0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.BatchNorm1d):
module.weight.data.normal_(1.0, 0.02)
module.bias.data.zero_()
elif isinstance(module, nn.BatchNorm2d):
module.weight.data.normal_(1.0, 0.02)
module.bias.data.zero_()
def forward(self, img, labels):
y_2 = labels.view(-1, self.num_class, 1, 1)
y_2 = y_2.expand(-1, -1, 28, 28)
x = torch.cat([img, y_2], 1)
x = self.conv(x)
x = x.view(-1, 128 * 7 * 7)
x = self.fc(x)
return x
1 Create a function that calculates the epoch.
python
def train_func(D_model, G_model, batch_size, z_dim, num_class, criterion,
D_optimizer, G_optimizer, data_loader, device):
#Training mode
D_model.train()
G_model.train()
#The real label is 1
y_real = torch.ones((batch_size, 1)).to(device)
D_y_real = (torch.rand((batch_size, 1))/2 + 0.7).to(device) #Noise label to put in D
#Fake label is 0
y_fake = torch.zeros((batch_size, 1)).to(device)
D_y_fake = (torch.rand((batch_size, 1)) * 0.3).to(device) #Noise label to put in D
#Initialization of loss
D_running_loss = 0
G_running_loss = 0
#Calculation for each batch
for batch_idx, (data, labels) in enumerate(data_loader):
#Ignore if less than batch size
if data.size()[0] != batch_size:
break
#Noise creation
z = torch.normal(mean = 0.5, std = 0.2, size = (batch_size, z_dim)) #Average 0.Generate random numbers according to a normal distribution of 5
real_img, label, z = data.to(device), labels.to(device), z.to(device)
#Discriminator update
D_optimizer.zero_grad()
#Put a real image in Discriminator and propagate forward ⇒ Loss calculation
D_real = D_model(real_img, label)
D_real_loss = criterion(D_real, D_y_real)
#Put the image created by putting noise in Generator in Discriminator and propagate forward ⇒ Loss calculation
fake_img = G_model(z, label)
D_fake = D_model(fake_img.detach(), label) #fake_Stop Loss calculated in images so that it does not propagate back to Generator
D_fake_loss = criterion(D_fake, D_y_fake)
#Minimize the sum of two Loss
D_loss = D_real_loss + D_fake_loss
D_loss.backward()
D_optimizer.step()
D_running_loss += D_loss.item()
#Generator update
G_optimizer.zero_grad()
#The image created by putting noise in the Generator is put in the Discriminator and propagated forward ⇒ The detected part becomes Loss
fake_img_2 = G_model(z, label)
D_fake_2 = D_model(fake_img_2, label)
#G loss(max(log D)Optimized with)
G_loss = -criterion(D_fake_2, y_fake)
G_loss.backward()
G_optimizer.step()
G_running_loss += G_loss.item()
D_running_loss /= len(data_loader)
G_running_loss /= len(data_loader)
return D_running_loss, G_running_loss
The criterion
that appears as an argument is the loss class (in this case, Binary Cross Entropy).
What this function is doing is in order
--Error back propagation by putting the real image of Dataset in Discriminator --Fake image created by Generator is put in Discrminator and error back propagation (Generator is not updated at this time) --Fake image created by Generator is put in Discriminator, and Generator is backpropagated by error.
is.
It's a little old, but this implementation incorporates the ingenuity that appears in "How to Train a GAN" at NIPS2016 to make GAN learning successful. GitHub link
Reference article
14 Techniques for Learning GAN (Generative Adversarial Networks)
When I created the Dataset class
python
return (sample/127.5)-1
Is this. Also, the final layer of Generator is nn.Tanh ()
.
python
#G loss(max(log D)Optimized with)
G_loss = -criterion(D_fake_2, y_fake)
Is this. D_fake_2
is the judgment of Discriminator, and y_fake
is a 128 × 1 0 vector.
Sample the noise to put in the Generator from a normal distribution instead of a uniform distribution.
python
#Noise creation
z = torch.normal(mean = 0.5, std = 0.2, size = (batch_size, z_dim)) #Average 0.Generate random numbers according to a normal distribution of 5
The mean and standard deviation are appropriate, but if you sample from [0,1] with a uniform distribution, you will not get a negative value, so I made the sampled noise value almost positive.
4.Batch Norm All the data that comes out of the DataLoader created above is a real image. vice versa
python
fake_img = G_model(z, label)
Then, from the label information and noise obtained from DataLoader, batch size fake images are created.
LeakyReLU seems to be effective for both Generator and Discriminator, so all activation functions are set to LeakyReLU. The argument 0.2 was followed as many implementations adopted this value.
The Discriminator label is usually 0 or 1, but we add noise here. Randomly sample real labels from 0.7 to 1.2 and fake labels from 0.0 to 0.3.
python
#The real label is 1
y_real = torch.ones((batch_size, 1)).to(device)
D_y_real = (torch.rand((batch_size, 1))/2 + 0.7).to(device) #Noise label to put in D
#Fake label is 0
y_fake = torch.zeros((batch_size, 1)).to(device)
D_y_fake = (torch.rand((batch_size, 1)) * 0.3).to(device) #Noise label to put in D
This is the part. I usually use y_real
/ y_fake
, and this time I actually used D_y_real
/ D_y_fake
.
This is an old article, so another optimizer such as RAdam may be better now.
This time, I put Dropout only once in the Linear layer of Generator. However, there is a theory that BatchNorm and Dropout are not compatible, so I feel that it is not absolutely better to put them all together.
Before training the model, define a function to display the image created by the Generator. Make this and check the learning degree of Generator for each epoch.
python
import os
from IPython.display import Image
from torchvision.utils import save_image
%matplotlib inline
def Generate_img(epoch, G_model, device, z_dim, noise, var_mode, labels, log_dir = 'logs_cGAN'):
G_model.eval()
with torch.no_grad():
if var_mode == True:
#Random numbers required for generation
noise = torch.normal(mean = 0.5, std = 0.2, size = (49, z_dim)).to(device)
else:
noise = noise
#Sample generation with Generator
samples = G_model(noise, labels).data.cpu()
samples = (samples/2)+0.5
save_image(samples,os.path.join(log_dir, 'epoch_%05d.png' % (epoch)), nrow = 7)
img = Image('logs_cGAN/epoch_%05d.png' % (epoch))
display(img)
All you have to do is put the image you created with noise in the Generator into a folder called logs_cGAN
and display it.
It is assumed that the same random number will be used every time when var_mode is False.
Train the model.
python
#Fixed seed value to ensure reproducibility
SEED = 1111
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
#device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def model_run(num_epochs, batch_size = batch_size, dataloader = train_loader, device = device):
#Dimension of noise to put in Generator
z_dim = 30
var_mode = False #Whether to use a different random number each time you see the display result
#Random numbers required for generation
noise = torch.normal(mean = 0.5, std = 0.2, size = (49, z_dim)).to(device)
#Number of classes
num_class = 49
#Make a label to use when trying out Generator
labels = []
for i in range(num_class):
tmp = np.identity(num_class)[i]
tmp = np.array(tmp, dtype = np.float32)
labels.append(tmp)
label = torch.Tensor(labels).to(device)
#Model definition
D_model = Discriminator(num_class).to(device)
G_model = Generator(z_dim, num_class).to(device)
#Definition of loss(The argument is train_Specified in func)
criterion = nn.BCELoss().to(device)
#Define optimizer
D_optimizer = torch.optim.Adam(D_model.parameters(), lr=0.0002, betas=(0.5, 0.999), eps=1e-08, weight_decay=1e-5, amsgrad=False)
G_optimizer = torch.optim.Adam(G_model.parameters(), lr=0.0002, betas=(0.5, 0.999), eps=1e-08, weight_decay=1e-5, amsgrad=False)
D_loss_list = []
G_loss_list = []
all_time = time.time()
for epoch in range(num_epochs):
start_time = time.time()
D_loss, G_loss = train_func(D_model, G_model, batch_size, z_dim, num_class, criterion,
D_optimizer, G_optimizer, dataloader, device)
D_loss_list.append(D_loss)
G_loss_list.append(G_loss)
secs = int(time.time() - start_time)
mins = secs / 60
secs = secs % 60
#View results by epoch
print('Epoch: %d' %(epoch + 1), " |Time required%d minutes%d seconds" %(mins, secs))
print(f'\tLoss: {D_loss:.4f}(Discriminator)')
print(f'\tLoss: {G_loss:.4f}(Generator)')
if (epoch + 1) % 1 == 0:
Generate_img(epoch, G_model, device, z_dim, noise, var_mode, label)
#Create a checkpoint file to save the model
if (epoch + 1) % 5 == 0:
torch.save({
'epoch':epoch,
'model_state_dict':G_model.state_dict(),
'optimizer_state_dict':G_optimizer.state_dict(),
'loss':G_loss,
}, './checkpoint_cGAN/G_model_{}'.format(epoch + 1))
return D_loss_list, G_loss_list
#Turn the model
D_loss_list, G_loss_list = model_run(num_epochs = 100)
It's kind of long, but I'm displaying the time required and loss for each epoch, and saving the model.
Let's see the transition of loss of Generator and Discriminator.
python
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(10,7))
loss = fig.add_subplot(1,1,1)
loss.plot(range(len(D_loss_list)),D_loss_list,label='Discriminator_loss')
loss.plot(range(len(G_loss_list)),G_loss_list,label='Generator_loss')
loss.set_xlabel('epoch')
loss.set_ylabel('loss')
loss.legend()
loss.grid()
fig.show()
From around 20 epoch, both losses have not changed. Both Discriminator and Generator loss are far from 0, so it seems to work reasonably well. By the way, if you try to make the generated characters into gifs in order from 1 to 100 epoch, it looks like this.
The upper left is "A" and the lower right is "ゝ". There are quite a few differences depending on the characters, and it seems that "u", "ku", "sa", "so", and "hi" are generated stably and well, but "na" and "yu" have transitions. It's fierce.
Below are the results of generating 5 images for each type. Epoch:5
Epoch:50
Epoch:100
Looking at this alone, it seems that it is not better to stack Epoch. "Mu" seems to be the best at 5 epoch, while "ゑ" seems to be the best at 100 epoch.
By the way, it looks like this when you fetch 5 training data each in the same way.
There are some things that even modern people can't read. "Su" and "mi" are quite different from their current shapes. Looking at this, I think that the performance of the model is quite good.
I tried to generate junk characters with cGAN. I think there is still a lot of room for improvement in the implementation, but I think the result itself is reasonable. It's been a long time, but I hope it helps even a part of it.
Also, some people have implemented general MNIST (handwritten numbers) using PyTorch with cGAN. There are many different parts such as the model structure, so I think this is also helpful.
Reference article
I tried to generate handwritten characters by deep learning [Pytorch x MNIST x CGAN]
Originally, I was lightly motivated to think "Is it possible to generate handwritten sentences?", So I will try it at the end.
Load the model weight from the saved checkpoint file and try pkl once.
python
import cloudpickle
%matplotlib inline
#Specify the epoch to retrieve
point = 50
#Define the structure of the model
z_dim = 30
num_class = 49
G = Generator(z_dim = z_dim, num_class = num_class)
#retrieve checkpoint
checkpoint = torch.load('./checkpoint_cGAN/G_model_{}'.format(point))
#Put parameters in Generator
G.load_state_dict(checkpoint['model_state_dict'])
#Keep in verification mode
G.eval()
#Save with pickle
with open ('KMNIST_cGAN.pkl','wb')as f:
cloudpickle.dump(G,f)
It seems that you can make it pkl by using a module called cloudpickle
instead of the usual pickle
.
Let's open this pkl file and generate a sentence.
python
letter = 'Aiue Okakikuke Kosashi Suseso Tachi Nune no Hahifuhe Homami Mumemoya Yuyorari Rurerowa ゐ'
strs = input()
with open('KMNIST_cGAN.pkl','rb')as f:
Generator = cloudpickle.load(f)
for i in range(len(str(strs))):
noise = torch.normal(mean = 0.5, std = 0.2, size = (1, 30))
str_index = letter.index(strs[i])
tmp = np.identity(49)[str_index]
tmp = np.array(tmp, dtype = np.float32)
label = [tmp]
img = Generator(noise, torch.Tensor(label))
img = img.reshape((28,28))
img = img.detach().numpy().tolist()
if i == 0:
comp_img = img
else:
comp_img.extend(img)
save_image(torch.tensor(comp_img), './sentence.png', nrow=len(str(strs)))
img = Image('./sentence.png')
display(img)
The result looks like this.
"I don't know anything anymore" ...