Speech Signal Processing Toolkit (SPTK) is a C language library that can perform speech analysis, speech synthesis, vector quantization, data processing, and so on. I thought it could be used for signal processing such as vibration, so I decided to give it a try.

This time, the contents that we want to realize using SPTK are as follows.

Must be operable via python
Data can be passed with numpy.ndarray for analysis
If possible available on windows

As a python module, there is a high-performance signal processing tool called librosa and a wrapper for SPTK published by volunteers called pysptk, but I want to use SPTK. It didn't seem to support the command, so I had to work on it.

In addition, since I have no knowledge of signal processing (programming is also suspicious), there may be mistakes in terms. Please understand that it is not bad.

1. Introduction of SPTK

For windows

I referred to the following HP.

Build with VisualStudio2019 x64 Native Tools. It was easier to install than I expected, but in my environment I had a problem with building "pitch.exe". So, I avoided it by forcibly deleting all the descriptions related to "pitch.exe" in the bin / Makefile.mak file before building.

for ubuntu

I referred to the following HP.

Touching SPTK

I can install it with ʻapt, but SPTK that can be installed with ʻapt seems to have limited optional features with some commands (this may be a problem in my environment). I think it's better to build from the source file obediently because there is a possibility that you will be addicted to extra things when using commands.

$ tar xvzf SPTK-3.11.tar.gz
$ cd SPTK-3.11
$ ./configure
$ make
$ sudo make install

1. How to use SPTK

First, I learned how to use SPTK. There is a wonderful HP that can be helpful. It was a great learning experience for me because he gave me a very detailed explanation. Thank you very much.

A breakthrough on artificial intelligence: audio signal processing with Python

SPTK command operation

SPTK is basically like a tool that operates using commands via the console. Here, create sin wave data with the command sin of SPTK and save it with the file name sin.data.

Open a console and enter the following command. A sine wave byte string with period 16 and length 48 (3 cycles) is saved with the file name sin.data.

$ sin -l 48 -p 16 > sin.data

To check the contents of the file, enter the SPTK command as follows:

$ x2x +f < sin.data | dmp +f

The result is output as shown below, and you can check the contents of the file. The number on the left is the index number. Keep in mind that the index numbers are automatically added for display and the actual data file contains only the numbers (on the right).

0       0
1       0.382683
2       0.707107
3       0.92388
4       1
5       0.92388
…

In addition, it seems that text data can also be read. In that case, prepare a text data file (sin.txt in the example below) in which the numerical values are separated by spaces (space separate value?), And read it with the following command.

$ x2x +af < sin.txt | dmp +f

When reading text data, the option must correspond to ʻASCII, such as + af`. (Because I didn't understand such basic specifications, I couldn't get the analysis result I expected, and I wasted about half a day ...)

Reading data in python

Now, let's read the byte string data sin.data saved earlier with python.

import numpy as np

with open('sin.data', mode='rb') as f:
    data = np.frombuffer(f.read(), dtype='float32')
    print(data)

result

[ 0.0000000e+00  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  1.2246469e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
 -2.4492937e-16  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  3.6739403e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01
 -4.8985874e-16  3.8268343e-01  7.0710677e-01  9.2387950e-01
  1.0000000e+00  9.2387950e-01  7.0710677e-01  3.8268343e-01
  6.1232340e-16 -3.8268343e-01 -7.0710677e-01 -9.2387950e-01
 -1.0000000e+00 -9.2387950e-01 -7.0710677e-01 -3.8268343e-01]

Creating data in python

Next, let's create byte string data to be passed to SPTK with python. It is quite important to specify the type. (I was addicted here too)

arr = np.array(range(0,5)) #Make a sequence appropriately

with open('test.data', mode='wb') as f:
    arr = arr.astype(np.float32) #Make float32 type
    barr = bytearray(arr.tobytes()) #to bytarray
    f.write(barr)

Read the file with SPTK and check it.

$ x2x +f < test.data | dmp +f

Cooperation between python and SPTK

If you save the numpy.ndarray created by python in this way to a file as a byte string and pass the file via a command, it seems that you can process the data with SPTK. Let's try using sin.data for a moment.

import subprocess

#Command to read data and apply window function
cmd = 'x2x +f < sin.data | window -l 16'

p = subprocess.check_output(cmd, shell = True)
out = np.frombuffer(p, dtype='float32')
print(out)

[-0.0000000e+00  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  5.6270582e-17 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18
  1.5901955e-33  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  1.6881173e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18
  3.1803911e-33  3.0001572e-03  2.5496081e-02  8.6776853e-02
  1.8433140e-01  2.7229854e-01  2.8093100e-01  1.7583697e-01
  2.8135290e-16 -1.5203877e-01 -2.0840828e-01 -1.7030001e-01
 -9.3926586e-02 -3.3312235e-02 -5.5435672e-03  2.4845590e-18]

More efficient data transfer

I was lamenting how wasteful it was to create a file just to pass data to SPTK, but there is something useful called ʻio.BytesIO`.

In the end, I prepared something like this.


import io
import shlex, subprocess
from typing import List

import numpy

def sptk_wrap(in_array : numpy.ndarray, sptk_cmd : str) -> numpy.ndarray:
    '''
input
        in_array :Waveform data
        sptk_cmd :sptk commands (eg'window -l 16'）
output
Data after analysis
    '''
    # numpy.Convert ndarray to bytearray
    arr = in_array.astype(np.float32)
    barr = bytearray(arr.tobytes())
    bio = io.BytesIO(barr)
    
    #sptk command
    cmd = shlex.split(sptk_cmd)
    proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    out, err = proc.communicate(input=bio.read())
    
    return np.frombuffer(out, dtype='float32')
   
    
def sptk_wrap_pipe(in_array : numpy.ndarray, sptk_cmd_pipe : List[str]) -> numpy.ndarray:
    '''
input
        in_array :Waveform data
        sptk_cmd_pipe :Sptk commands stored in a list in the order you want to pipe
(Example)
        cmd_list = [
            'window -l 512 -L 512 -w 2',
            'spec -l 512 -o 0',
           ]
output
Data after analysis
    '''
    out_array = numpy.copy(in_array)
    for l in sptk_cmd_pipe:
        out_array = sptk_wrap(out_array, l)
        
    return out_array


#Spectrum analysis example
def ndarr2sp_ndarr(in_array : numpy.ndarray, length : int, wo : int = 2, oo : int = 0) -> numpy.ndarray:
    '''
input:Waveform data
output:Log power spectrum
    
option:
    wo :Window function options (0:blackman 1:hammin 2:hanning 3:barlett）
    oo :Output spectrum form (0: 20 × log |Xk|　）

sptk command example
    window -l 512 -L 512 -w 2 | spec -l 512 -o 0
    '''
    cmd_list = [
        "window -l {0} -L {0} -w {1} ".format(length, wo),
        "spec -l {0} -o {1}".format(length, oo),
    ]

    return sptk_wrap_pipe(in_array, cmd_list)

2. Example of use

Create appropriate waveform data and actually analyze it. Here, 10 sets of samples with a data length of 512 were created while changing the frequency of the data to be created.

import numpy as np
import matplotlib.pyplot as plt

N = 2**9            #Number of waveform samples to analyze 512
dt = 0.01          #Sampling interval
t = np.arange(0, N*dt, dt) #Time axis
freq = np.linspace(0, 1.0/dt, N) #Frequency axis

samples = []
for f in range(1,11):
    #Set the frequency of the waveform to be created to 1~Create 10 sets of waveform samples while changing to 10.
    wave = np.sin(2*np.pi*f*t)
    samples.append(wave)
    
samples = np.asarray(samples)
print(samples.shape)

Output: (10, 512)

When you plot the created data, it looks like this.

1st data (frequency 1Hz)

plt.plot(t, samples[0])

10th data (frequency 10Hz)

plt.plot(t, samples[9])

Now, let's analyze the spectrum of the 10th data using SPTK.

ps = ndarr2sp_ndarr(samples[9], N)

plt.plot(freq[:N//2+1], ps)
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

You can also analyze multiple data at once. However, the result is output in a flatly connected state, so reshaping is required.

First, check the shape of the dataset.

samples_shape = samples.shape
print(samples_shape)

Output: (10, 512)

Analyze 10 pieces together with SPTK.

ps_s = ndarr2sp_ndarr(samples, N)
print(ps_s.shape)

Output: (2570,)

Reshape.

ps_s = ps_s.reshape((samples_shape[0],-1))
print(ps_s.shape)

Output: (10, 257)

10th data (frequency 10Hz)

print(np.max(ps_s[9]))
plt.plot(freq[:N//2+1], ps_s[9])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

Output: 19.078928

3. 3. Supplement

I compared it with the result of my own analysis. I tried normalizing with the number of data and multiplying by the correction value of the window function, but the decibel value is slightly different from the result analyzed by SPTK.

I don't know the reason ... It's likely that you're doing something stupid. (Please tell me who is familiar with it)

wavedata = samples[9]

#Put a honey window
hanningWindow = np.hanning(len(wavedata))
wavedata = wavedata * hanningWindow

#Calculate the correction coefficient
acf = 1/(sum(hanningWindow)/len(wavedata))

#Fourier transform (converted to frequency signal)
F = np.fft.fft(wavedata)

#Normalization+Double the AC component
F = 2*(F/N)
F[0] = F[0]/2

#Amplitude spectrum
Adft = np.abs(F)

#Multiply the correction coefficient when multiplying the window function
Adft = acf * Adft

#Power spectrum
Pdft = Adft ** 2
#Logarithmic power spectrum
PdftLog = 10 * np.log10(Pdft)
# PdftLog = 10 * np.log(Pdft)

print(np.max(PdftLog))

start=0
stop=int(N/2)
plt.plot(freq[start:stop], PdftLog[start:stop])
plt.xlabel("frequency [Hz]")
plt.ylabel("Logarithmic Power Spectrum [dB]")

plt.show()

Output: -0.2237693

index５.png

Operate the Speech Signal Processing Toolkit via python