Consider improving the accuracy of VAE abnormality detection

Various anomaly detection methods using deep distance learning and generative models have been proposed. Among them, we conducted an experiment using the anomaly detection method using the non-regularization term, which was announced at the 2018 Japanese Society for Artificial Intelligence National Convention and is useful for detecting anomalies in complex industrial products with VAE.

Among the methods, there is a problem that an abnormality judgment occurs erroneously in the normal part, specifically, there is a problem that an excessive abnormality judgment occurs due to underestimation of the standard deviation output layer σ, and we will consider a solution method. did. This time, we will verify the accuracy improvement effect of data expansion. (What is data expansion? By adding conversion processing (inversion, enlargement, reduction, etc.) to an image, the training data is "inflated". By inflating, the same image is less likely to be learned, so generalization Performance is improved.)

Self-introduction

Hello. I'm maharuda, a research intern at ProsCons.

As part of the company's benchmarking work experience, I will write an article about VAE, which is used as one of the anomaly detection methods. Nice to meet you!

Purpose

One of the problems that occurred when experimenting with the anomaly detection method using the non-regularization term, improvement of erroneous abnormality judgment in the normal part.

Anomaly detection using VAE

A neural network that converts data X to a latent variable z (less dimensions than the original data) is called an encoder, and a neural network that reconstructs a latent variable z to restore the original data is called a decoder. Train the input data and the reconstructed data to be as similar as possible. The above architecture is called an autoencoder (AE). And the one that pushes the latent variable of AE into the probability distribution is called VAE. Please refer to the following article for details.

・ Variational Autoencoder thorough explanation (https://qiita.com/kenmatsu4/items/b029d697e9995d93aa24)

In general, anomaly detection using VAE is realized by detecting the difference between the data before it is put into the encoder and the data reconstructed by VAE as an abnormality.

A useful method for detecting anomalies in complex industrial products

Industrial products are made up of various elements. For example, in the case of gears, it is from the flat surface of the gear, the tooth part and the hole in the center. Image elements that appear frequently have higher likelihood than image elements that appear only occasionally.

Therefore, when the loss function is used as a function for anomaly detection, the threshold value for considering anomalies in images that appear frequently is larger than that in images that appear only occasionally (in images that appear frequently, anomalies appear in images that appear only occasionally). It comes out more often). Screenshot from 2020-03-25 17-23-20.png Figure 1. Intuitive illustration of likelihood in industrial product images (from paper)

The following papers propose methods that can eliminate the effects of the complexity and frequency of the group to which the image belongs. By using this, it is possible to detect anomalies in images of complex industrial products (objects that have anomalies even in simple parts).

・ Abnormality detection of industrial products using denormalized anomaly degree by deep generative model (https://confit.atlas.jp/guide/event-img/jsai2018/2A1-03/public/pdf?type=in)

The loss function of VAE is Screenshot from 2020-03-25 17-24-58.png It can be expressed by. (From the paper)

Generally, this loss function $ L_ {VAE} $ is used to evaluate VAE anomaly detection. By subtracting $ D_ {VAE} $ and $ A_ {VAE} $ from $ L_ {VAE} $ to make $ M_ {VAE} $, it has been improved so that anomalies can be judged with the same threshold.

$ M_ {VAE} $ has the numerator the square of the mean of the data $ x $ and the difference between the data $ x $, and has a standard deviation of $ \ sigma_x $ that potentially represents the uncertainty and complexity of the data $ x $. This is a function that has a denominator. As we will see later, $ σ_x $ in the denominator of this $ M_ {VAE} $ is too small, causing problems. The methods covered in this paper are explained in detail in the following articles, so please refer to them.

・ Image anomaly detection using Variational Autoencoder Part 1 (https://qiita.com/shinmura0/items/811d01384e20bfd1e035)

Let's immediately detect anomalies using this method.

inspection result

A small white gear was used as an abnormality detection target. The resulting image is from the left ・ Image showing abnormal parts with heat map ·The original image is.

![Screenshot from 2020-03-25 19-47-42.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/608276/73095215-80f4-b153-87f4-61af7223accf.png)

<Image 1 with abnormality> Abnormal: Missing right tooth Screenshot from 2020-03-25 19-48-15.png

<Image 2 with abnormality> Abnormal: All teeth are worn Screenshot from 2020-03-25 19-49-13.png

It detects abnormal parts (parts where gear teeth are missing), but Abnormality judgment has also appeared on the normal part (the surface of the white gear).

hypothesis

In VAE, the standard deviation $ \ sigma_x $ is adjusted for the uncertainty of reconstruction so that $ A_ {VAE} $ and $ M_ {VAE} $ are balanced. (From the paper)

As the learning progresses in the direction of reducing the loss function, it is unlikely that $ M_ {VAE} $ is in a large state at the time of learning.

At the time of learning, the average vector $ \ mu_x $ is as close as possible to $ x $, and even if $ \ sigma_x $ is a very small value, $ M_ {VAE} $ can be kept small, so $ at the time of anomaly detection. If (\ mu_x-x) ^ 2 $ grows even a little, it is thought that $ M_ {VAE} $ will jump up. We will seek a solution below.

Solution (data expansion)

Learning proceeds in the direction of lowering the loss function. On the other hand, the inflated data makes it difficult for $ \ mu_x $ and $ x $ to approach each other during learning. Doing so will prevent $ σ_x $ from becoming too small.

pattern 1

Add an image with the RGB value of each pixel of the original image reduced by 2 (4,6,8,10) to the data. The amount of data will be 6 times larger.

Image before processing Image with 10 subtracted from each of RGB Screenshot from 2020-03-25 17-32-49.png

![Screenshot from 2020-03-25 19-49-49.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/608276/ca1d1d4b-12c9-dbdf-db88-bf51ec8ffb0e.png)

<Image 1 with abnormality> Abnormal: Missing right tooth Screenshot from 2020-03-25 19-50-50.png

<Image 2 with abnormality> Abnormal: All teeth are worn Screenshot from 2020-03-25 19-51-10.png

Hmm. It is erroneously judged as abnormal.

Pattern 2

I added salt pepper noise, which is commonly called. The ratio of white dots to black dots is 1: 1. Image before processing Noise ratio 0.4% Screenshot from 2020-03-25 17-38-00.png

![Screenshot from 2020-03-25 19-53-20.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/608276/fad126f5-bdb5-fe2c-11f3-c00ebd0b0454.png)

<Image 1 with abnormality> Abnormal: Missing right tooth Screenshot from 2020-03-25 20-14-15.png

<Image 2 with abnormality> Abnormal: All teeth are worn Screenshot from 2020-03-25 19-55-08.png

It feels pretty good. It's not completely removed, but the heat map on the normal part has decreased.

Consideration

The results for pattern 2 are positive, and we can see that expanding the data creates diversity in the data, creating a difference between $ μ $ and $ x $ and not making $ σ $ too small. I did. The usefulness of data expansion for this problem of falsely determining the normal part as abnormal has been shown.

Also, considering that pattern 1 did not improve the accuracy, it was found that there are cases where it can be said that it is useful even in data expansion. Regarding the verification experiment conducted this time, it is better to provide white (0,0,0) or black (255,255,255) pixels as different images than to reduce RGB uniformly, and to prevent over-learning. It can be said that it was possible. Therefore, the extension of the image with changed brightness does not affect the shape information and does not affect the intrinsic complexity, whereas the extension of the image with salt pepper noise changes the shape information and is inherently complex. It is thought that it worked effectively because the number of images increased. Therefore, although this is a hypothetical stage, images with modified shape information may be more useful for data expansion.

In conclusion, data expansion can prevent underestimation of the standard deviation output layer $ σ $ in anomaly detection techniques using non-regularization terms. Whether or not to use an image with changed shape information for data expansion may be related to the effectiveness. You can say that.

After the internship

Despite the short period of one month, I was able to learn deeply about machine learning. Pros Cons is developing a visual inspection AI for industrial products called Gemini eye. It was a very meaningful experience for me to be involved in the work in the form of the benchmarking work. Thank you for working for a cozy company.