Introduction

This article is an article that implements, tunes, and considers Auto Encoder.

Target reader

People who want to make a model that detects anomaly without a teacher
You can read and improve the basic neural network yourself.

We propose a recommended model for those who meet the above conditions.

Explanation of the flow of anomaly detection

Problem setting

First, what is unsupervised anomaly detection will be explained using an example. When you make a product, you will end up with a defective product. If wrinkles are visible when the image of the defective product is taken, it is very difficult for a person to separate it, so a neural network will automatically detect it. However, when it is not enough to binary classify whether a wrinkled image is abnormal or normal, it is possible to find the abnormal value of the image by using only the normal image for training.

Basic approach

Function that takes a normal image and returns a normal image: Learn $ f $ with a neural network such that $ f (x) = x $ ($ x $ is a normal image). Outlier|x-f(x)|Defined byxIf is a normal image, it will be smaller because it is trained to be 0,xIf is an abnormal image, it is the first time for a neural network.|x-f(x)|Becomes bigger,It is to be established. Such a model is called an Auto Encoder, and the internal structure of the model is often such that the convolution layer is used to make it smaller and the image is made larger by Up Sampling. (Hourglass type) Note that if you use a smart model such as U-net here, it is easy to make predictions for abnormal images, so you cannot get abnormal values well.

Denoise approach

It is necessary to make a model that is stupid to some extent with the model written above and predicts that it is normal, but fails if it is abnormal. Therefore, it is quite common for the model to be as smart as possible and to undo the noise added to the input (denoise), that is, to learn $ x = f (x + z) $ ($ z $ is noise). It seems to work.

Recommended by the author

When I guessed that using super-resolution instead of denoising would work well to make image prediction difficult, the performance was quite good in the problem of detecting wrinkles and scratches. We will compare the three approaches introduced so far and show how much they differ.

Specific implementation

Problem setting

It is not possible to consider the result using the image of the actual product because of the rights, so the problem becomes quite simple, but 1 of the handwritten numbers of mnist is a normal image, and the numbers other than 1 are abnormal. Compare the above three approaches as images.

Data organization

If you copy all the code part, it will be introduced in a working form. Implement with keras. First of all, import and download the data.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from keras import layers
from keras import models
from keras import optimizers
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Convert data into a manageable format

train_true = train_images[train_labels==1].reshape(6742,28,28,1)/255#Training image(1)6742 sheets
test_true = test_images[test_labels==1].reshape(1135,28,28,1)/255#Normal image of verification(1)Is 1135 sheets
test_false = test_images[test_labels!=1].reshape(8865,28,28,1)/255#Abnormal image of verification(Other than 1)Is 1032 sheets

I thought that the number of mnist datasets was uniform from 0 to 9, but it seems to be disjointed. The image was normalized to the range [0,1].

Model definition

Define model1 as just an encoder decoder model1

ffc = 8#first filter count
model1 = models.Sequential()
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu",input_shape = (28,28,1)))
model1.add(layers.BatchNormalization())
model1.add(layers.MaxPooling2D((2,2)))
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.MaxPooling2D((2,2)))
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.UpSampling2D())
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.UpSampling2D())
model1.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model1.add(layers.BatchNormalization())
model1.add(layers.Conv2D(1,(3,3),padding="same",activation="sigmoid"))
model1.compile(loss = "mae",optimizer="adam")
model1.summary()

Here, batch regularization was applied between convolutions. The activation of the output layer is sigmoid. (To map to [0,1]) The loss function has the advantage that it is harder to blur in image generation if the absolute value is used instead of the square error in MAE. I defined model2 in the same way.

ffc = 8#first filter count
model2 = models.Sequential()
model2.add(layers.MaxPooling2D((4,4),input_shape = (28,28,1)))
model2.add(layers.UpSampling2D())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.UpSampling2D())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(ffc,(3,3),padding="same",activation="relu"))
model2.add(layers.BatchNormalization())
model2.add(layers.Conv2D(1,(3,3),padding="same",activation="sigmoid"))
model2.compile(loss = "mae",optimizer="adam")
model2.summary()

The calculation material is used for decoding by performing only maxpooling and encoding.

Training and results

Let's actually train.

Normal encoder decoder

t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
  hist = model1.fit(train_true,train_true,steps_per_epoch=10,epochs = 1,verbose=0)
  true_d = ((test_true - model1.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
  false_d = ((test_false- model1.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
  t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
  f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
  print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()

Unlike the loss function, the outlier was derived using a square error. When this is done, model 1, that is, a graph showing the transition of the probability of how correctly each of the normal image and the abnormal image can be judged while training by an ordinary encoder / decoder is output. The threshold was set as $ \ mu (mean) + 3 \ sigma (standard deviation) $ of the normal image.

Output example below スクリーンショット 2020-06-04 18.58.28.png Blue is the transition of the probability of judging a normal image as normal, and orange is the transition of the probability of judging an abnormal image as an abnormal image. Since there is always a threshold value 3 standard deviations above the average of normal images, the probability of judging a normal image as a normal image is always about 98%. It seems that about 50 epoch was enough to learn. (The final value is that the correct answer rate for normal images is 98.68%, and the correct answer rate for abnormal images is 97.13).

Noise encoder decoder

The model is trained using model1 by adding noise of the same size as the size trained at scale 0.1 at the center of 0 to the input. The specific code is as follows.

t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
  hist = model1.fit(train_true,train_true+0.1*np.random.normal(size=train_true.shape),steps_per_epoch=10,epochs = 1,verbose=0)
  true_d = ((test_true - model1.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
  false_d = ((test_false- model1.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
  t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
  f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
  print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()

If you do this, you will get the following graph. スクリーンショット 2020-06-04 19.14.59.png It seems that the convergence has become faster to some extent. The final value was 98.68% for normal images and 97.59% for abnormal images.

Super-resolution approach

Let's train model2 and see the result.

t_acc = np.zeros(50)
f_acc = np.zeros(50)
for i in range(50):
  hist = model2.fit(train_true,train_true,steps_per_epoch=10,epochs = 1,verbose=0)
  true_d = ((test_true - model2.predict(test_true))**2).reshape(1135,28*28).mean(axis=-1)
  false_d = ((test_false- model2.predict(test_false))**2).reshape(8865,28*28).mean(axis=-1)
  t_acc[i] = (true_d<=true_d.mean()+3*true_d.std()).sum()/len(true_d)
  f_acc[i] = (false_d>true_d.mean()+3*true_d.std()).sum()/len(false_d)
  print("{}The percentage of correct answers for normal learning images in the week is{:.2f}%So, the correct answer rate for abnormal images is{:.2f}%is".format(i+1,t_acc[i]*100,f_acc[i]*100))
plt.plot(t_acc)
plt.plot(f_acc)
plt.show()

Result graph スクリーンショット 2020-06-04 19.25.23.png It was a useless model than I had imagined. Although the final value does not seem to have converged, the normal correct answer rate was 98.59%, and the correct answer rate for abnormal images was 84.85%.

Conclusion

I looked at the handwritten character 1 to distinguish between 1 and the others, but in this case it turned out that adding noise was effective. It is quite visible that the model accustomed to returning it to 1 will improve the accuracy because 1 looks like 7 or 9 by adding this noise. In other words, if the type of anomaly is known, the performance of the Auto Encoder will be extremely high if training is provided to add the type of anomaly and return it.

The super-resolution approach recommended for this dataset was not very effective, but when detecting wrinkles and scratches on actual industrial products, the approach is to reduce the resolution to make the wrinkles and scratches invisible and then restore them. However, there are cases where it is effective, so it is a good idea to change the model to be used while looking at the target problem.

Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~

Introduction

Target reader

Explanation of the flow of anomaly detection

Problem setting

Basic approach

Denoise approach

Recommended by the author

Specific implementation

Problem setting

Data organization

Model definition

Training and results

Normal encoder decoder

Noise encoder decoder

Super-resolution approach

Conclusion