Speech Signal Processing Toolkit (SPTK) is a C language library that can perform speech analysis, speech synthesis, vector quantization, data processing, and so on. I thought it could be used for signal processing such as vibration, so I decided to give it a try.
This time, the contents that we want to realize using SPTK
are as follows.
As a python module, there is a high-performance signal processing tool called librosa
and a wrapper for SPTK
published by volunteers called pysptk
, but I want to use SPTK
. It didn't seem to support the command, so I had to work on it.
In addition, since I have no knowledge of signal processing (programming is also suspicious), there may be mistakes in terms. Please understand that it is not bad.
I referred to the following HP.
Build with VisualStudio2019 x64 Native Tools
. It was easier to install than I expected, but in my environment I had a problem with building "pitch.exe".
So, I avoided it by forcibly deleting all the descriptions related to "pitch.exe" in the bin / Makefile.mak file before building.
I referred to the following HP.
I can install it with ʻapt, but
SPTK that can be installed with ʻapt
seems to have limited optional features with some commands (this may be a problem in my environment). I think it's better to build from the source file obediently because there is a possibility that you will be addicted to extra things when using commands.
$ tar xvzf SPTK-3.11.tar.gz
$ cd SPTK-3.11
$ ./configure
$ make
$ sudo make install
First, I learned how to use SPTK
. There is a wonderful HP that can be helpful. It was a great learning experience for me because he gave me a very detailed explanation. Thank you very much.
SPTK
is basically like a tool that operates using commands via the console. Here, create sin wave data with the command sin
of SPTK
and save it with the file name sin.data
.
Open a console and enter the following command. A sine wave byte string with period 16 and length 48 (3 cycles) is saved with the file name sin.data
.
$ sin -l 48 -p 16 > sin.data
To check the contents of the file, enter the SPTK
command as follows:
$ x2x +f < sin.data | dmp +f
The result is output as shown below, and you can check the contents of the file. The number on the left is the index number. Keep in mind that the index numbers are automatically added for display and the actual data file contains only the numbers (on the right).
0 0
1 0.382683
2 0.707107
3 0.92388
4 1
5 0.92388
…
In addition, it seems that text data can also be read. In that case, prepare a text data file (sin.txt
in the example below) in which the numerical values are separated by spaces (space separate value?), And read it with the following command.
$ x2x +af < sin.txt | dmp +f
When reading text data, the option must correspond to ʻASCII, such as
+ af`. (Because I didn't understand such basic specifications, I couldn't get the analysis result I expected, and I wasted about half a day ...)
Now, let's read the byte string data sin.data
saved earlier with python.
import numpy as np
with open('sin.data', mode='rb') as f:
data = np.frombuffer(f.read(), dtype='float32')
print(data)
result
[ 0.0000000e+00 3.8268343e-01 7.0710677e-01 9.2387950e-01
1.0000000e+00 9.2387950e-01 7.0710677e-01 3.8268343e-01
1.2246469e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
-1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
-2.4492937e-16 3.8268343e-01 7.0710677e-01 9.2387950e-01
1.0000000e+00 9.2387950e-01 7.0710677e-01 3.8268343e-01
3.6739403e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
-1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
-4.8985874e-16 3.8268343e-01 7.0710677e-01 9.2387950e-01
1.0000000e+00 9.2387950e-01 7.0710677e-01 3.8268343e-01
6.1232340e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
-1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01]
Next, let's create byte string data to be passed to SPTK
with python. It is quite important to specify the type. (I was addicted here too)
arr = np.array(range(0,5)) #Make a sequence appropriately
with open('test.data', mode='wb') as f:
arr = arr.astype(np.float32) #Make float32 type
barr = bytearray(arr.tobytes()) #to bytarray
f.write(barr)
Read the file with SPTK
and check it.
$ x2x +f < test.data | dmp +f
0 0
1 1
2 2
3 3
4 4
If you save the numpy.ndarray created by python in this way to a file as a byte string and pass the file via a command, it seems that you can process the data with SPTK
.
Let's try using sin.data
for a moment.
import subprocess
#Command to read data and apply window function
cmd = 'x2x +f < sin.data | window -l 16'
p = subprocess.check_output(cmd, shell = True)
out = np.frombuffer(p, dtype='float32')
print(out)
[-0.0000000e+00 3.0001572e-03 2.5496081e-02 8.6776853e-02
1.8433140e-01 2.7229854e-01 2.8093100e-01 1.7583697e-01
5.6270582e-17 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
-9.3926586e-02 -3.3312235e-02 -5.5435672e-03 2.4845590e-18
1.5901955e-33 3.0001572e-03 2.5496081e-02 8.6776853e-02
1.8433140e-01 2.7229854e-01 2.8093100e-01 1.7583697e-01
1.6881173e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
-9.3926586e-02 -3.3312235e-02 -5.5435672e-03 2.4845590e-18
3.1803911e-33 3.0001572e-03 2.5496081e-02 8.6776853e-02
1.8433140e-01 2.7229854e-01 2.8093100e-01 1.7583697e-01
2.8135290e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
-9.3926586e-02 -3.3312235e-02 -5.5435672e-03 2.4845590e-18]
I was lamenting how wasteful it was to create a file just to pass data to SPTK
, but there is something useful called ʻio.BytesIO`.
In the end, I prepared something like this.
import io
import shlex, subprocess
from typing import List
import numpy
def sptk_wrap(in_array : numpy.ndarray, sptk_cmd : str) -> numpy.ndarray:
'''
input
in_array :Waveform data
sptk_cmd :sptk commands (eg'window -l 16')
output
Data after analysis
'''
# numpy.Convert ndarray to bytearray
arr = in_array.astype(np.float32)
barr = bytearray(arr.tobytes())
bio = io.BytesIO(barr)
#sptk command
cmd = shlex.split(sptk_cmd)
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
out, err = proc.communicate(input=bio.read())
return np.frombuffer(out, dtype='float32')
def sptk_wrap_pipe(in_array : numpy.ndarray, sptk_cmd_pipe : List[str]) -> numpy.ndarray:
'''
input
in_array :Waveform data
sptk_cmd_pipe :Sptk commands stored in a list in the order you want to pipe
(Example)
cmd_list = [
'window -l 512 -L 512 -w 2',
'spec -l 512 -o 0',
]
output
Data after analysis
'''
out_array = numpy.copy(in_array)
for l in sptk_cmd_pipe:
out_array = sptk_wrap(out_array, l)
return out_array
#Spectrum analysis example
def ndarr2sp_ndarr(in_array : numpy.ndarray, length : int, wo : int = 2, oo : int = 0) -> numpy.ndarray:
'''
input:Waveform data
output:Log power spectrum
option:
wo :Window function options (0:blackman 1:hammin 2:hanning 3:barlett)
oo :Output spectrum form (0: 20 × log |Xk| )
sptk command example
window -l 512 -L 512 -w 2 | spec -l 512 -o 0
'''
cmd_list = [
"window -l {0} -L {0} -w {1} ".format(length, wo),
"spec -l {0} -o {1}".format(length, oo),
]
return sptk_wrap_pipe(in_array, cmd_list)
Create appropriate waveform data and actually analyze it. Here, 10 sets of samples with a data length of 512 were created while changing the frequency of the data to be created.
import numpy as np
import matplotlib.pyplot as plt
N = 2**9 #Number of waveform samples to analyze 512
dt = 0.01 #Sampling interval
t = np.arange(0, N*dt, dt) #Time axis
freq = np.linspace(0, 1.0/dt, N) #Frequency axis
samples = []
for f in range(1,11):
#Set the frequency of the waveform to be created to 1~Create 10 sets of waveform samples while changing to 10.
wave = np.sin(2*np.pi*f*t)
samples.append(wave)
samples = np.asarray(samples)
print(samples.shape)
Output: (10, 512)
When you plot the created data, it looks like this.
1st data (frequency 1Hz)
plt.plot(t, samples[0])
10th data (frequency 10Hz)
plt.plot(t, samples[9])
Now, let's analyze the spectrum of the 10th data using SPTK
.
ps = ndarr2sp_ndarr(samples[9], N)
plt.plot(freq[:N//2+1], ps)
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")
You can also analyze multiple data at once. However, the result is output in a flatly connected state, so reshaping is required.
First, check the shape
of the dataset.
samples_shape = samples.shape
print(samples_shape)
Output: (10, 512)
Analyze 10 pieces together with SPTK
.
ps_s = ndarr2sp_ndarr(samples, N)
print(ps_s.shape)
Output: (2570,)
Reshape.
ps_s = ps_s.reshape((samples_shape[0],-1))
print(ps_s.shape)
Output: (10, 257)
10th data (frequency 10Hz)
print(np.max(ps_s[9]))
plt.plot(freq[:N//2+1], ps_s[9])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")
Output: 19.078928
I compared it with the result of my own analysis. I tried normalizing with the number of data and multiplying by the correction value of the window function, but the decibel value is slightly different from the result analyzed by SPTK
.
I don't know the reason ... It's likely that you're doing something stupid. (Please tell me who is familiar with it)
wavedata = samples[9]
#Put a honey window
hanningWindow = np.hanning(len(wavedata))
wavedata = wavedata * hanningWindow
#Calculate the correction coefficient
acf = 1/(sum(hanningWindow)/len(wavedata))
#Fourier transform (converted to frequency signal)
F = np.fft.fft(wavedata)
#Normalization+Double the AC component
F = 2*(F/N)
F[0] = F[0]/2
#Amplitude spectrum
Adft = np.abs(F)
#Multiply the correction coefficient when multiplying the window function
Adft = acf * Adft
#Power spectrum
Pdft = Adft ** 2
#Logarithmic power spectrum
PdftLog = 10 * np.log10(Pdft)
# PdftLog = 10 * np.log(Pdft)
print(np.max(PdftLog))
start=0
stop=int(N/2)
plt.plot(freq[start:stop], PdftLog[start:stop])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")
plt.show()
Output: -0.2237693
Recommended Posts