I tried to find the features required for machine learning by Fourier transforming from time series data. There were some articles that automatically created and analyzed the periodic function, but I took time to figure out how to deal with the actual data, so I created it as a memorandum. I also summarized it, including understanding the concepts and terms. It may be ambiguous because it is a rough understanding. Please point out any mistakes.
To briefly summarize the Fourier transform, __ A conversion of the periodic signal of time series data for the time domain into a spectrum in the frequency domain __ become. Also, by applying a window function (generally a humming window), it is forcibly regarded as periodic.
The Fast Fourier Transform refers to a discrete Fourier transform that can be calculated at high speed when the number of data is a power of 2. I don't understand the details of the formulas and algorithms, but if you have a huge amount of data, you can expect an improvement in calculation speed by padding to 0 and aligning the number of data to the power of 2.
First, the raw data is Fourier transformed as it is. Here, Numpy fft is used. (Numpy version is '1.18.1') https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html FFT is a fast Fourier transform, but when you pass a value to numpy, it is Fourier transformed even if it is not a power of 2. In other words, it is considered that FFT is executed when it is a power of 2, and DFT is executed when it is not a power of 2. This data is 30fps 1 second data That is, the sample data 30 is Fourier transformed. This time, one column and 30 rows of a certain csv file is treated as data.
Data retrieval
#Import of required libraries
import numpy as np
#Data loading
data = np.loadtxt('____.csv', delimiter=',', dtype=float)
features = data[0:30,0:1]
features[0:30,0:1]
The 30 data you want to analyze have been imported into features. By the way, the data in features is the following array data.
features
array([[0.32640783],[0.32772677],[0.32963271],[0.32872528],[0.33125733],[0.3250282 ],[0.33900562],[0.33105499],[0.33294834],[0.34554142],[0.33217829],[0.33006385],[0.33765947],[0.33173415],[0.33826796][0.34325231],[0.35284764],[0.34785128],[0.349527 ],[0.34782048],[0.35720176],[0.36520328],[0.37276383],[0.37766436],[0.37410199],[0.37990772],[0.38644416],[0.38045958],[0.37864619],[0.39122537]])
I proceeded with the implementation while referring to the related articles. The implementation code and result are as follows.
main
#Sampling frequency
rate = 30 # [1/s]
#Sample time[s]
t = np.arange(0, 1, 1/rate)
#signal
signal = np.ravel(features[0:30,0:1])
plt.plot(t, signal)
plt.xlim(0, 1)
plt.xlabel('time [s]', fontsize=20)
plt.title('Time series of signal', fontsize=20)
plt.show()
#print(np.ravel(signal))
# power spectrum
p = np.abs(np.fft.rfft(signal))
hammingWindow = np.hamming(30) #Humming window
plt.plot(t, hammingWindow*signal)
plt.show()
p2 = np.abs(np.fft.rfft(hammingWindow*signal))
#Given the sampling rate (the reciprocal of), returns the frequency of each component
f = np.fft.rfftfreq(signal.size, d=1./rate)
print(p)
print(np.fft.rfft(hammingWindow*signal),p2)
plt.xlabel("Frequency [Hz]", fontsize=20)
plt.ylabel("Amplitude spectrum", fontsize=20)
plt.yscale("log")
plt.plot(f, p)
plt.show()
plt.yscale("log")
plt.plot(f, p2)
plt.show()
#Fast Fourier transform
F = np.fft.fft(hammingWindow*signal)
#Calculate amplitude spectrum
Amp = np.abs(F)
freq = np.linspace(0, 1.0/0.03, 30) #Frequency axis
print(np.fft.rfft(signal))
plt.plot(freq, Amp)
plt.show()
First, the data (features) in the time domain were graphed.
If you look at it, you can see that there is no periodicity ... Since the purpose is to implement it, we will proceed as it is. In addition, a shape adapted to a humming window that gives periodicity is like this.
It looks like a cycle. (Because the edges are aligned)
The result of Fourier transforming each of them into an amplitude spectrum is as follows.
[10.49214916 0.35316679 0.16715576 0.10588365 0.04224876 0.08174254 0.0362962 0.03914191 0.04407553 0.07351728 0.02862219 0.05424663 0.06239112 0.03642959 0.04382539 0.01436891]
[5.44620123 2.3808305 0.0238307 0.02004583 0.03723908 0.02777476 0.00710696 0.01448964 0.00951674 0.02489049 0.01539931 0.01742193 0.01562835 0.01323487 0.01355436 0.01185119]
The left axis is a logarithmic graph. A similar shaped amplitude spectrum was obtained.
As described above, the Fourier transform was performed to obtain the amplitude spectrum. The power spectrum is the square of the amplitude spectrum, so if you want the power spectrum, you can square the amplitude spectrum.
Below, the terms of the Fourier transform and numpy were complicated when determining the amplitude spectrum, so I will summarize them.
np.fft.fft: Returns the Fourier transform result of the real part and the imaginary part np.fft.rfft: Returns only the real part (1/2 array of np.fft.fft)
Therefore, abs (np.fft.fft) and abs (np.fft.rfft) have the same size but different return lengths. np.fft.rfft is half of np.fft.fft. It is not necessary because it is a complex conjugate.
F (w) = np.fft.fft (w): Fourier spectrum (Fourier transform)
|F(w)| = np.abs(F(w)): Amplitude spectrum(F(w)Absolute value of)
|F(w)|^2: Power spectrum[F(w)*(F(w)Conjugate complex number of)], Amplitude spectrum squared, power(power)To have the dimension of
In fact, Fourier analysis is installed in Excel's data analysis tool. When I verified whether it was the same as numpy, the same real part and imaginary part as np.fft.fft were returned. If you have trouble programming, that may be better. However, the values are a mixture of the real part and the imaginary part, and a new calculation is required to obtain the amplitude spectrum. Also, note that in Excel, the number of data cannot be executed unless it is a power of 2.
It can be considered that the low frequency component forms a large outline in the graph shape, and the high frequency component is noisy. It was seen during a literature search. Therefore, if only the spectrum of the low frequency component 0Hz to 2Hz as a result of Fourier transform is inverse Fourier transformed, I thought that I could draw a graph close to the original data, so I implemented it. The first three data of what was rffted earlier are left as they are, and the others are set to 0.
irfft
plt.plot(t, np.fft.irfft([1.04921492e+01+0.j,8.98589353e-02+0.34154378j,-6.04938542e-02+0.15582535j,0,0,0,0,0,0,0,0,0,0,0,0,0]))
plt.show()
result
It has a similar shape when compared with the graph of the original data. Therefore, it can be said that it was possible to reproduce (feature extraction was possible with frequency components).
https://ja.wikipedia.org/wiki/離散フーリエ変換 https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html https://kyotogeopython.zawawahoge.com/html/基礎編/Numpyの基礎(7)便利なライブラリ群.html https://aidiary.hatenablog.com/entry/20110716/1310824587 https://www.geidai.ac.jp/~marui/r_program/spectrum.html
Recommended Posts