This time, I will implement Variational Autoencoder with keras.
Consider a primitive Autoencoder. Suppose that the weight W1 and the bias b1 are applied to the input x and mapped to the intermediate layer through the activation function f1, and then the weight W2 and the bias b2 are applied and output through the activation function f2.
At this time, if f2 is ** identity function ** and the loss function is the sum of ** squared error **, learning proceeds so that the output y reproduces the input x. W1 and b1 are called features that represent data.
First, let's implement it from a simple autoencoder. The dataset uses MNIST.
Since the input image is MNIST, 28 * 28 = 784 dimensions, narrow it down to 256 dimensions, 64 dimensions, and 32 dimensions, and then restore it to 64 dimensions, 256 dimensions, and 784 dimensions. The loss function is the cross entropy of the difference between the output image and the input image.
Since the number of dimensions is narrowed down in the middle, the neural network learns the weights in an attempt to leave only the important features. After learning, if you put an image in input, almost the same image will be output from output.
Now, let's move the code below.
from keras.layers import Input, Dense
from keras.models import Model
from keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt
#Data set reading
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
#Model building
encoding_dim = 32
input_img = Input(shape=(784,))
x1 = Dense(256, activation='relu')(input_img)
x2 = Dense(64, activation='relu')(x1)
encoded = Dense(encoding_dim, activation='relu')(x2)
x3 = Dense(64, activation='relu')(encoded)
x4 = Dense(256, activation='relu')(x3)
decoded = Dense(784, activation='sigmoid')(x4)
autoencoder = Model(input=input_img, output=decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.summary()
#Learning
autoencoder.fit(x_train, x_train,
nb_epoch=50,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
#Convert test image with training model
decoded_imgs = autoencoder.predict(x_test)
n = 10
plt.figure(figsize=(10, 2))
for i in range(n):
#Display test image
ax = plt.subplot(2, n, i+1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
#Display converted image
ax = plt.subplot(2, n, i+1+n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
The upper row is the original test image, and the lower row is the image converted from the test image by the autoencoder. I think that the test image can be reproduced almost.
By the way, the output is determined by the most narrowed down 32-dimensional Z-layer signal. In other words, it can be said that numbers 0 to 9 of various shapes are distributed in the 32-dimensional latent space Z.
We cannot actually see the 32nd dimension, but what if we drop the number of dimensions and make the latent space z 2D? In two dimensions, you can show how the numbers 0-9 are distributed on a plane.
Now, let's see what happens when the most narrowed down part is made two-dimensional. Change ʻencoding_dim = 32 to ʻencoding_dim = 2
in the previous code and execute it.
As expected, it is difficult to reproduce if the latent space z is two-dimensional. "0", "1", "7" are reproduced, but the rest are mixed with other numbers and cannot be reproduced well.
In other words, in a narrow latent space of two dimensions, the numbers 0 to 9 cannot be neatly divided and distributed, and it seems that many numbers are mixed and distributed.
How can we successfully distribute the numbers 0-9 in the narrow latent space Z without overlapping? The typical distribution in nature is the normal distribution (Gaussian distribution), so here we assume that the distribution of numbers 0 to 9 follows the normal distribution, and consider this model.
When a number enters the input and is narrowed down to 64 dimensions, we look at the mean $ \ mu $ variance $ \ sigma $ to find out what normal distribution the number belongs to. Randomly sampled values from that distribution are put into Z, and the Decoder weights are learned so that there is no difference between input and output. By doing this, it seems that the numbers from 0 to 9 can be distributed well without overlapping.
However, there is a problem with this idea, and if a random sampling element is included, error back propagation will not be possible. Therefore, it is such a model that enables error back propagation while making the best use of this idea.
Where $ \ epsilon $ is a very small random number. Multiply the variance $ \ sigma $ by this $ \ epsilon $. This technique is called ** Reparametrization Trick **. If you code this part as $ \ mu $ = z_mean, $ \ sigma $ = z_logvar, $ \ epsilon $ = epsilon,
# Reparametrization Trick
def sampling(args):
z_mean, z_logvar = args
batch = K.shape(z_mean)[0]
dim = K.int_shape(z_mean)[1]
epsilon = K.random_normal(shape=(batch, dim)) # ε
return z_mean + K.exp(0.5 * z_logvar) * epsilon
The VAE loss function $ L $ is expressed as follows. $L = E_{z \sim q(z|x)} log_{Pmodel}(x|z) - D_{KL}(q(z|x)||Pmodel(z)) $
The first term is the expected value of the log-likelihood of the data X for q (z | X), which indicates whether the output is close to the original data, so replace it with the squared error. The second term is called the Kullback-Leibler distance ($ D_ {KL} $ represents KL divergence), and p (z) is normally distributed.
$=\beta||y-x||^2 - D_{KL} (N(\mu, \sigma)|N(0,1))\ $
The second term is expressed as kl_loss, which is $ \ sigma ^ 2 $ = z_logvar, $ \ mu $ = z_mean, and when replaced with an approximate expression, the loss function $ L $ (vae_loss) becomes as follows.
#Loss function
# Kullback-Leibler Loss
kl_loss = 1 + z_logvar - K.square(z_mean) - K.exp(z_logvar)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
# Reconstruction Loss
reconstruction_loss = mse(inputs, outputs)
reconstruction_loss *= original_dim
vae_loss = K.mean(reconstruction_loss + kl_loss)
Let's write and execute the entire VAE code, including the previous code.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from keras.layers import Lambda, Input, Dense
from keras.models import Model
from keras.datasets import mnist
from keras.losses import mse
from keras import backend as K
import numpy as np
import matplotlib.pyplot as plt
#Data set reading
(x_train, y_train), (x_test, y_test) = mnist.load_data()
image_size = x_train.shape[1] # = 784
original_dim = image_size * image_size
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
input_shape = (original_dim, )
latent_dim = 2
# Reparametrization Trick
def sampling(args):
z_mean, z_logvar = args
batch = K.shape(z_mean)[0]
dim = K.int_shape(z_mean)[1]
epsilon = K.random_normal(shape=(batch, dim), seed = 5) # ε
return z_mean + K.exp(0.5 * z_logvar) * epsilon
#VAE model construction
inputs = Input(shape=input_shape)
x1 = Dense(256, activation='relu')(inputs)
x2 = Dense(64, activation='relu')(x1)
z_mean = Dense(latent_dim)(x2)
z_logvar = Dense(latent_dim)(x2)
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_logvar])
encoder = Model(inputs, [z_mean, z_logvar, z], name='encoder')
encoder.summary()
latent_inputs = Input(shape=(latent_dim,))
x3 = Dense(64, activation='relu')(latent_inputs)
x4 = Dense(256, activation='relu')(x3)
outputs = Dense(original_dim, activation='sigmoid')(x4)
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()
z_output = encoder(inputs)[2]
outputs = decoder(z_output)
vae = Model(inputs, outputs, name='variational_autoencoder')
#Loss function
# Kullback-Leibler Loss
kl_loss = 1 + z_logvar - K.square(z_mean) - K.exp(z_logvar)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
# Reconstruction Loss
reconstruction_loss = mse(inputs, outputs)
reconstruction_loss *= original_dim
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='adam')
vae.fit(x_train,
epochs=50,
batch_size=256,
validation_data=(x_test, None))
#Convert test image
decoded_imgs = vae.predict(x_test)
#Display of test image and converted image
n = 10
plt.figure(figsize=(10, 2))
for i in range(n):
#Display test image
ax = plt.subplot(2, n, i+1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
#View converted image
ax = plt.subplot(2, n, i+1+n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
It is now possible to reproduce the latent space z in only two dimensions. Many numbers such as "0", "1", "2", "7", and "9" can be reproduced. "4" and "5" are still useless.
Since the latent space Z is two-dimensional, it can be represented on a plane. Let's see how the numbers 0 to 9 are distributed there and what the image is distributed there. Add the following to the code for the entire VAE and execute it.
import matplotlib.cm as cm
def plot_results(encoder,
decoder,
x_test,
y_test,
batch_size=128,
model_name="vae_mnist"):
z_mean, _, _ = encoder.predict(x_test,
batch_size=128)
plt.figure(figsize=(12, 10))
cmap=cm.tab10
plt.scatter(z_mean[:, 0], z_mean[:, 1], c=cmap(y_test))
m = cm.ScalarMappable(cmap=cmap)
m.set_array(y_test)
plt.colorbar(m)
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()
# (-4, -4)From(4, 4)Is divided into 30x30 and plotted
n = 30 # 50>30
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
grid_x = np.linspace(-4, 4, n)
grid_y = np.linspace(-4, 4, n)[::-1]
for i, yi in enumerate(grid_y):
for j, xi in enumerate(grid_x):
z_sample = np.array([[xi, yi]])
x_decoded = decoder.predict(z_sample)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[i * digit_size: (i + 1) * digit_size,
j * digit_size: (j + 1) * digit_size] = digit
plt.figure(figsize=(10, 10))
start_range = digit_size // 2
end_range = n * digit_size + start_range + 1
pixel_range = np.arange(start_range, end_range, digit_size)
sample_range_x = np.round(grid_x, 1)
sample_range_y = np.round(grid_y, 1)
plt.xticks(pixel_range, sample_range_x)
plt.yticks(pixel_range, sample_range_y)
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.axis('off')
plt.imshow(figure, cmap='Greys_r')
#plt.savefig(filename)
plt.show()
plot_results(encoder,
decoder,
x_test,
y_test,
batch_size=128,
model_name="vae_mlp")
"0", "1", "2", "6", "7" seem to be distributed without overlapping with other numbers.
Since it is distributed along the normal distribution, you can see that it changes continuously from one number to another.
Recommended Posts