I implemented ChangeFinder (change point detection)

Articles sent by data scientists from the manufacturing industry
This time, we implemented ChangeFinder (sample code), which is an anomaly detection method that detects changes in time-series data.

Introduction

We have organized anomaly detection methods and time-series data analysis methods in the past, so if you are interested, we would appreciate it if you could refer to them as well.

What is ChangeFinder

"Change Finder" is a method that utilizes the SDAR algorithm, which adds "online learning" and "forgetting function" to the learning of the autoregressive model (AR model). It seems that this method was invented by Professor Yamanishi when he was a member of NEC. For details, refer to Anomaly detection by data mining.

The algorithm is briefly organized below.

Learn probability density with SDAR (first stage)
Calculate the change point score at each point in time
Level the change point score
Learn probability density with SDAR (second stage)
Calculate the change point score at each point in time

ChangeFinder implementation

This time, there was a sample code in the changefinder library, so I used that.

The python code is below.

#Import required libraries
import numpy as np
import changefinder
import matplotlib.pyplot as plt
%matplotlib inline

First import the required libraries. Next, generate random numbers that follow the three types of normal distribution used this time.

data=np.concatenate([np.random.normal(0.7, 0.05, 300),
                     np.random.normal(1.5, 0.05, 300),
                     np.random.normal(0.6, 0.05, 300),
                     np.random.normal(1.3, 0.05, 300)])

Next, use ChangeFinder to calculate the change point score.

cf = changefinder.ChangeFinder(r=0.01, order=1, smooth=7)

ret = []
for i in data:
    score = cf.update(i)
    ret.append(score)

Finally, visualize the result. The red line is the original data and the blue line is the change point score.

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(ret)
ax2 = ax.twinx()
ax2.plot(data,'r')
plt.show()

This time, the parameters set in ChangeFinder are the following three.

r: Forgetting parameter (If it is made smaller, the influence of the past becomes larger and the variation of the change point becomes larger)
order: degree of AR model
smooth: Range of smoothing (The longer it is, the more "changes" are captured instead of outliers, but if it is too large, the changes themselves will be difficult to capture in the first place).

In this data set, it seems that the change point is noticeable, so it seems that it can be detected well, but when I actually used it in the field data, I had a hard time tuning the parameters.

at the end

Thank you for reading to the end. This time, I checked the sample code for ChangeFinder, an anomaly detection method that detects changes in time-series data.

If you have a request for correction, we would appreciate it if you could contact us.