Hello everybody
Suddenly I started playing with python and this is the third time, but this time I will play with wav files as well. This time, we will perform Fourier transform on the waveform data read by wav, or restore it by Fourier inverse transform and write it to wav.
Sound is generated by the vibration of air (sound waves). Furthermore, the vibration is made up of overlapping vibrations of multiple frequencies. In such vibrations of sound, the dominant frequency determines the pitch of the sound, the magnitude of the amplitude determines the loudness of the sound, and the vibrations of various frequencies that are not dominant determine the timbre of the sound.
Since sound contains various frequencies, it is difficult to analyze it as it is. Therefore, a method was invented to express the sound with some simple superpositions. This is the Fourier series expansion.
f(t) = \sum_k a_k e^{\rm{i}kt}, \ \ \ \ \ e^{\rm{i}kt} = \cos(kt) + \rm{i}\sin(kt)
It's like this. I won't talk about mathematics here, so if you don't understand it, you may want to look it up in Euler's formula. Now, if you know $ a_k $ here, you can approximate this $ f (t) $. How can you find this? Let's use a little trick, but first suppose you observe $ f (t) $ for a certain amount of time, $ T $. Multiply both sides of the previous series expansion by $ e ^ {-\ rm {i} k't} $ and try to integrate.
\int_0^T f(t)e^{-\rm{i}k't}dt = \sum_k a_k \int_0^T e^{\rm ikt - ik't}dt
It will be. If you select $ k $ as $ 2n \ pi / T $ here, the integral on the right side will remain only when $ k = k'$.
a_k = \frac{1}{T} \int_0^T f(t)e^{-\rm ikt}dt
The calculation to find this $ a_k $ is called the Fourier transform. Actually, this integral is approximated by numerical calculation.
Once you have $ a_k $, the Fourier series expansion is complete. Generating the shape of a function in real space from the distribution of $ a_k $ is called the Fourier inverse transform.
Well, let's put the theoretical story aside and carry out the actual processing. Same as last time Use jupyter. First, read the wav file as usual.
import wave
import struct
from scipy import fromstring, int16
import numpy as np
from pylab import *
%matplotlib inline
wavfile = '/data/input/test.wav'
wr = wave.open(wavfile, "rb")
ch = wr.getnchannels()
width = wr.getsampwidth()
fr = wr.getframerate()
fn = wr.getnframes()
N = 256
span = 15000
print('Channel', ch)
print('Total number of frames', fn)
print('Sample time', 1.0 * N * span / fr, 'Seconds')
origin = wr.readframes(wr.getnframes())
data = origin[:N * span * ch * width]
wr.close()
print('Current array length', len(origin))
print('Sample sequence length: ', len(data))
When you do this,
Channel 2
Total number of frames 15884224
Sample time 87.07482993197279 seconds
Current array length 63536896
Sample sequence length: 15360000
The result is like this.
The reason why the read ʻorigin array length is four times the number of frames seems to be because one frame is 4 bytes because it is stereo and the sample size is 2 (16 bits). ʻOrigin
is a byte string, so it's natural. .. ..
For the time being, let's process the extracted data so that it can be easily Fourier transformed.
#Stereo premise
X = np.frombuffer(data, dtype="int16")
left = X[::2]
right = X[1::2]
First, read the sample data with int16 and divide it into the sound heard from the left and the sound heard from the right.
Next, define the Fourier transform and the inverse transform (series expansion). This time we will use numpy fft. fft is a library for FFT (Fast Fourier Transform).
def fourier (x, n, w):
K = []
for i in range(0, w-2):
sample = x[i * n:( i + 1) * n]
partial = np.fft.fft(sample)
K.append(partial)
return K
def inverse_fourier (k):
ret = []
for sample in k:
inv = np.fft.ifft(sample)
ret.extend(inv.real)
print (len(sample))
return ret
The function fourier
returns an array of frequency distributions for each sample interval.
The function ʻinverse_fourier` generates a waveform in real space based on the frequency distribution.
This time, I will not do any special processing, just convert the Fourier transform and immediately reverse convert it and write it to wav.
Kl = fourier(left, N, span)
Kr = fourier(right, N, span)
freqlist = np.fft.fftfreq(N, d=1/fr)
amp = [np.sqrt(c.real ** 2 + c.imag ** 2) for c in Kl[2500]]
plot(freqlist, amp, marker= 'o', linestyle='-')
axis([0, fr / 2 , 0, 100000])
amp = [np.sqrt(c.real ** 2 + c.imag ** 2) for c in Kr[2500]]
plot(freqlist, amp, marker= 'o', linestyle='-')
First, try Fourier transform. Then, the frequency distribution for each sample section can be obtained. If you plot the distribution at an appropriate moment, you will get the following result.
After confirming that the frequency distribution is correct, let's perform inverse conversion.
def combine_wav (left, right):
ret = []
number = len(right) if len(left) > len(right) else len(left)
for i in range(0, number -1):
data = [int(left[i]), int(right[i])]
ret.extend(data)
return ret
left_dash = inverse_fourier(Kl)
right_dash = inverse_fourier(Kr)
raw_data = combine_wav(left_dash, right_dash)
outd = struct.pack("h" * len(raw_data), *raw_data)
It's a little annoying, but because it's a stereo sound source, the inverse Fourier of the left sound source and the inverse Fourier of the right sound source are mixed.
#Export
outf = '/data/output/test.wav'
ww = wave.open(outf, 'w')
ww.setnchannels(ch)
ww.setsampwidth(width)
ww.setframerate(fr)
ww.writeframes(outd)
ww.close()
So, I will write it out.
I made something that doesn't feel unnatural even for myself with deafness.
I converted the contents of the wav file to Fourier and played with it. For the time being, this time it is like this.
Recommended Posts