This article is an article that implements, tunes, and considers Auto Encoder.
We propose a recommended model for those who meet the above conditions.
First, what is unsupervised anomaly detection will be explained using an example. When you make a product, you will end up with a defective product. If wrinkles are visible when the image of the defective product is taken, it is very difficult for a person to separate it, so a neural network will automatically detect it. However, when it is not enough to binary classify whether a wrinkled image is abnormal or normal, it is possible to find the abnormal value of the image by using only the normal image for training.
Function that takes a normal image and returns a normal image: Learn $ f $ with a neural network such that $ f (x) = x $ ($ x $ is a normal image).
Outlier
It is necessary to make a model that is stupid to some extent with the model written above and predicts that it is normal, but fails if it is abnormal. Therefore, it is quite common for the model to be as smart as possible and to undo the noise added to the input (denoise), that is, to learn $ x = f (x + z) $ ($ z $ is noise). It seems to work.
When I guessed that using super-resolution instead of denoising would work well to make image prediction difficult, the performance was quite good in the problem of detecting wrinkles and scratches. We will compare the three approaches introduced so far and show how much they differ.
It is not possible to consider the result using the image of the actual product because of the rights, so the problem becomes quite simple, but 1 of the handwritten numbers of mnist is a normal image, and the numbers other than 1 are abnormal. Compare the above three approaches as images.
If you copy all the code part, it will be introduced in a working form. Implement with keras. First of all, import and download the data.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from keras import layers
from keras import models
from keras import optimizers
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
Convert data into a manageable format
train_true = train_images[train_labels==1].reshape(6742,28,28,1)/255#Training image(1)6742 sheets
test_true = test_images[test_labels==1].reshape(1135,28,28,1)/255#Normal image of verification(1)Is 1135 sheets
test_false = test_images[test_labels!=1].reshape(8865,28,28,1)/255#Abnormal image of verification(Other than 1)Is 1032 sheets
I thought that the number of mnist datasets was uniform from 0 to 9, but it seems to be disjointed. The image was normalized to the range [0,1].
Define model1 as just an encoder decoder model1
ffc = 8#first filter count
model1 = models.Sequential()
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu",input_shape = (28,28,1)))
model1.add(layers.BatchNormalization())
model1.add(layers.MaxPooling2D((2,2)))
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.MaxPooling2D((2,2)))
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.UpSampling2D())
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.UpSampling2D())
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.Conv2D(1,(3,3),padding="same",activation="sigmoid"))
model1.compile(loss = "mae",optimizer="adam")
model1.summary()
Here, batch regularization was applied between convolutions. The activation of the output layer is sigmoid. (To map to [0,1]) The loss function has the advantage that it is harder to blur in image generation if the absolute value is used instead of the square error in MAE. I defined model2 in the same way.
ffc = 8#first filter count
model2 = models.Sequential()
model2.add(layers.MaxPooling2D((4,4),input_shape = (28,28,1)))
model2.add(layers.UpSampling2D())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.UpSampling2D())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(1,(3,3),padding="same",activation="sigmoid"))
model2.compile(loss = "mae",optimizer="adam")
model2.summary()
The calculation material is used for decoding by performing only maxpooling and encoding.
Let's actually train.
t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
hist = model1.fit(train_true,train_true,steps_per_epoch=10,epochs = 1,verbose=0)
true_d = ((test_true - model1.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
false_d = ((test_false- model1.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()
Unlike the loss function, the outlier was derived using a square error. When this is done, model 1, that is, a graph showing the transition of the probability of how correctly each of the normal image and the abnormal image can be judged while training by an ordinary encoder / decoder is output. The threshold was set as $ \ mu (mean) + 3 \ sigma (standard deviation) $ of the normal image.
Output example below Blue is the transition of the probability of judging a normal image as normal, and orange is the transition of the probability of judging an abnormal image as an abnormal image. Since there is always a threshold value 3 standard deviations above the average of normal images, the probability of judging a normal image as a normal image is always about 98%. It seems that about 50 epoch was enough to learn. (The final value is that the correct answer rate for normal images is 98.68%, and the correct answer rate for abnormal images is 97.13).
The model is trained using model1 by adding noise of the same size as the size trained at scale 0.1 at the center of 0 to the input. The specific code is as follows.
t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
hist = model1.fit(train_true,train_true+0.1*np.random.normal(size=train_true.shape),steps_per_epoch=10,epochs = 1,verbose=0)
true_d = ((test_true - model1.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
false_d = ((test_false- model1.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()
If you do this, you will get the following graph. It seems that the convergence has become faster to some extent. The final value was 98.68% for normal images and 97.59% for abnormal images.
Let's train model2 and see the result.
t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
hist = model2.fit(train_true,train_true,steps_per_epoch=10,epochs = 1,verbose=0)
true_d = ((test_true - model2.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
false_d = ((test_false- model2.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()
Result graph It was a useless model than I had imagined. Although the final value does not seem to have converged, the normal correct answer rate was 98.59%, and the correct answer rate for abnormal images was 84.85%.
I looked at the handwritten character 1 to distinguish between 1 and the others, but in this case it turned out that adding noise was effective. It is quite visible that the model accustomed to returning it to 1 will improve the accuracy because 1 looks like 7 or 9 by adding this noise. In other words, if the type of anomaly is known, the performance of the Auto Encoder will be extremely high if training is provided to add the type of anomaly and return it.
The super-resolution approach recommended for this dataset was not very effective, but when detecting wrinkles and scratches on actual industrial products, the approach is to reduce the resolution to make the wrinkles and scratches invisible and then restore them. However, there are cases where it is effective, so it is a good idea to change the model to be used while looking at the target problem.
Recommended Posts