There was a noise removal method that I was convinced that "there is such a method" in kaggle's kernel (now notebook), so I will summarize what I investigated here and there.
There is a method called ** wavelet transform ** as an analysis method for a certain signal or waveform. I think that the Fourier transform is often imagined as an analysis method, but while the Fourier transform expands the waveform into the sum of trigonometric functions (sai, kosai), the wavelet transform decomposes it into a waveform of any shape. Can be done. (And there is an inverse conversion)
Therefore, if a partially characteristic wave (** mother wavelet **) can be confirmed in the given waveform, by decomposing the wave as the basis, it is possible to determine in which time zone and rule the characteristic appears. You can check it. (The coefficient that appears when decomposed is called the ** wavelet coefficient **)
Therefore, if the shape of the noise can be grasped, the wavelet transform is performed using it as the mother wavelet, the noise portion is replaced with 0, and then the inverse transform is performed to obtain a noise-free waveform.
Now, the function that gave the above mechanism is as follows.
import numpy as np
import pywt
def maddest(d, axis=None):
return np.mean(np.absolute(d - np.mean(d, axis)), axis)
def denoise(x, wavelet='db', level=1):
coeff = pywt.wavedec(x, wavelet, mode="per")
sigma = (1/0.6745) * maddest(coeff[-level])
uthresh = sigma * np.sqrt(2*np.log(len(x)))
coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])
return pywt.waverec(coeff, wavelet, mode='per')
The first def gives the calculation of mean absolute deviation The deviation is the difference between each data and the average, the absolute deviation is the absolute value of the deviation, and the average absolute deviation is the absolute deviation divided by the population, that is, the average.
Example: 4 data of 50 points, 70 points, 90 points, 70 points (average 70 points) The deviations are -20, 0, 20, 0, Absolute deviations are 20, 0, 20, 0, The mean absolute deviation is (0 + 20 + 20 + 0) / 4 = 10.
The second def gives waveform denoising. First
coeff = pywt.wavedec(x, wavelet, mode='per')
Now, the given waveform is wavelet-transformed. Parameter x is array data, wavelet is the name of the wavelet (db this time), mode is the boundary condition (per means that the right and left ends of the waveform are connected) coeff is given as a list of wavelet coefficients. (However, the first from the beginning is not the wavelet coefficient. The remainder of the minimum resolution is stored.)
next
sigma = (1/0.6745) * maddest(coeff[-level])
Then, the magnitude of noise is given. maddest (coeff [-level]) shows the average absolute deviation of the first value from the back of the list of wavelet coefficients (the higher the back of the list, the higher the resolution. The setting is level). 0.6745 is magical, but it is a constant that represents the ratio of standard deviation to mean absolute deviation.
next
uthresh = sigma * np.sqrt(2*np.log(len(x)))
coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])
Then, the threshold value (uthresh) is calculated first, and when the wavelet coefficient falls below the threshold value, it is regarded as noise and 0 is given.
Finally
return pywt.waverec(coeff, wavelet, mode='per')
Then, the original shape is restored by the inverse transform of the wavelet transform. Since the process of examining the gap with the threshold value is performed, a waveform without noise can be obtained.
I felt like trying a little more theory.
Discrete Wavelet Transform (DWT) Basics of wavelet transform [Noise removal by wavelet](https://medium.com/@junkoda/%E3%82%A6%E3%82%A7%E3%83%BC%E3%83%96%E3%83%AC%E3 % 83% 83% E3% 83% 88% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E3% 83% 8E% E3% 82% A4% E3% 82% BA% E9% 99 % A4% E5% 8E% BB-fc20d82bcb80)
Recommended Posts