Human beings are creatures who want to make synthetic speech reading software by themselves. This is unavoidable. Pascal also says, people think of reeds. you see? (What?)
Well, for the time being, I will try an implementation that makes sounds with python with nuances like that preliminary investigation.
--Language: Python
python
import numpy as np
import matplotlib.pyplot as pl
import wave
import struct
import pyaudio
Jupyter notebook may make the sound a little easier (I don't know the details), but no.
Do you guys know what sound is?
Sound is like a periodic (?) Change in air density.
In short, it's a wave. Speaking of waves, it's sin, cos. Hooray!
In conclusion, this time we will use a sine wave with the following formula.
sin(2πnt/s)
note_hz
=n
sample_hz
=s
python
sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)
t
represents the time of 1 second, and in the above case it is a one-dimensional array of 44100 elements.
The information in the world we live in is continuous (analog), but unfortunately personal computers can only handle discrete (digital) data.
Therefore, one second is divided into 44100 pieces for expression.
(By the way, the sampling frequency of 44100hz is the standard of the sampling frequency of CD, and it is about twice the number of human audible range. Why is it doubled? Let's google with Nyquist frequency.)
The content of the sign is * 2πnt / s *.
t / sample_hz
* = t / s * increases to * 0,1,2, ..., 44100 * By dividing * t * by * s = 44100 *, * 0,1 / 44100 , 2/44100, ..., 440099/44100,1 *, which expresses "one second that gradually increases (1/44100 each)".
Once you ignore note_hz
* = n * and look atnp.sin (2 * np.pi * t / sample_hz)
* = sin (2πt / s) *, * t / s * is 0 Since it is considered to be a variable that increases from → 1 (rather than a function of time?), It can be seen that * 2πt / s * inside sin increases from 0 to 2π.
In other words, * sin (2πt / s) * is a function that goes around the unit circle exactly in one second (a wave that vibrates once in one second).
Vibrating once in 1 second means that the frequency of this wave is 1 [Hz = 1 / s].
However, at frequency 1, it is inaudible.
That's where note_hz
* = n * comes in.
You can freely change the frequency of the wave simply by multiplying n by * 2πt / s *. For example, if * n = 440 *, * sin (2πnt / s) * becomes a wave ("la" in sound) that vibrates 440 times per second.
This completes the expression of sound in the program. I'll copy the program pasted above again.
python
sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)
The flow from here is as follows.
Regarding 3., if you don't care about the waveform, you don't have to do it at all. 2. uses a module called pyaudio, but it is troublesome to install with Python 3.7 series (if you want to install, please refer to the reference site at the end of this page), so the .wav created in 1. All you have to do is play the file with Windows Media Player.
Now, I will explain how to output as .wav.
It is binarized. Binaryization means converting data into binary numbers. When using the wave module, it seems that it is not possible to write to .wav files unless it is binarized. Perhaps. So let's make it binary!
Let's paste the answer first.
python
max_num = 32767.0 / max(wv) #Preparation for binarization
wv16 = [int(x * max_num) for x in wv] #Preparation for binarization
bi_wv = struct.pack("h" * len(wv16), *wv16) #Binary
It is like this. (Rather, it's almost like copying the site that I referred to, but is there any etiquette that prohibits copying ...? )
Let's look at the contents of [int (x * max_num) for x in wv]
with wv
* = W, * x
* =" each of the child elements of W "= w *.
In each child element w of W,
x * max_num
=x * 32767.0 / max(wv)
What number is 32767! I think you understand.
This is because the possible values of 16-bit data (data expressed in 16-digit binary numbers) are * -32768 to 32767 *. (Because 2 to the 16th power is 65536, and half of them are 32768 ...
The values that * w / max (W) * can take are * -1 to 1 *, and by multiplying it by 32767, * 32767 ・ (w / max (W)) * takes the value of * -32767 to 32767 *. The waveform data of the sound is evenly (or rather perfect?) Fitted in 16 bits.
That's what you can do with wv16
. Huh ...
And the binary code bi_wv = struct.pack ("h" * len (wv16), * wv16)
.
To be honest, I don't know anything about this. This is a copy.
For the time being, the struct binary struct.pack
converts it to binary format, and the first argument"h"
seems to be a 2byte (16bit) integer format. Hey.
Yes, binarization is complete!
I will paste the answer again first.
python
file = wave.open('sin_wave.wav', mode='wb') #sin_wave.Open wav in write mode. (If the file does not exist, create a new one.)
param = (1,2,sample_hz,len(bi_wv),'NONE','not compressed') #Parameters
file.setparams(param) #Parameter setting
file.writeframes(bi_wv) #Writing data
file.close #Close file
It is like this.
Open the file with wave.open ()
.
Specify the name of the file with the first argument, and set the write mode ('wb'
) or read mode ('rb'
) with the second argument mode =
.
Set the parameters of the .wav file with wave.setparams ()
.
The parameters (param
) are in order from the left
t
arrays)'NONE'
is supported. Does that make sense ...?)'not compressed'
is returned for the compressed format'NONE'
.)is.
Then write the binary data (bi_wv
) and close the file.
It's easy to forget to close the file ...
Alright, it's done! !! (Try running the file at a terminal or command prompt to see if a .wav file is generated!)
So, first open the file you created earlier with the wave module.
python
file = wave.open('sin_wave.wav', mode='rb')
Now you can open it.
You are in read mode properly.
The file
part offile = wave.open ('sin_wave.wav', mode ='rb')
represents a variable, so you can use a different name. fairu
, wave_no_kiwami_otome
, whatever.
Well, I just said it.
When I was a beginner, did I have to call it file
? Because I misunderstood.
Then play the sound with the pyaudio module.
python
p = pyaudio.PyAudio() #Instantiation of pyaudio
stream = p.open(
format = p.get_format_from_width(file.getsampwidth()),
channels = wr.getnchannels(),
rate = wr.getframerate(),
output = True
) #Create a stream for recording and playing sound.
file.rewind() #Move the pointer back to the beginning.
chunk = 1024 #I'm not sure, but the official documentation did this.
data = file.readframes(chunk) #Read chunks (1024) of frames (sound waveform data).
while data:
stream.write(data) #Make a sound by writing data to the stream.
data = file.readframes(chunk) #Load a new chunk frame.
stream.close() #Close the stream.
p.terminate() #Close PyAudio.
As above, the procedure is
1.Open pyaudio, 2. Open stream, 3. Write data to stream to make sound, 4. Close stream, 5. Close pyaudio
It's like that.
Well, how long an article would be. I'm tired, Patrasche. That's why I put the code as burn.
python
pl.plot(t,wv)
pl.show()
How simple! matplotlib.pyplot has a lot of articles so I won't say anything in particular.
Thank you for reading this far! I did my best for the second Qiita article in my life ...
Even so, can I really make artificial speech synthesis software by myself?
-[\ Notes ] python wave module --Qiita ――It was very helpful. -[Sound programming with python](http://samuiui.com/2019/03/11/python%E3%81%A7%E9%9F%B3%E3%83%97%E3%83%AD%E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0 /) --A site that seems to use Jupyter notebook. I haven't seen much.
python
import numpy as np
import matplotlib.pyplot as pl
import wave
import struct
import pyaudio
#Chapter1
sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)
#Chapter2
max_num = 32767.0 / max(wv) #Preparation for binarization
wv16 = [int(x * max_num) for x in wv] #Preparation for binarization
bi_wv = struct.pack("h" * len(wv16), *wv16) #Binary
file = wave.open('sin_wave.wav', mode='wb') #sin_wave.Open wav in write mode. (If the file does not exist, create a new one.)
param = (1,2,sample_hz,len(bi_wv),'NONE','not compressed') #Parameters
file.setparams(param) #Parameter setting
file.writeframes(bi_wv) #Writing data
file.close #Close file
#Chapter3
file = wave.open('sin_wave.wav', mode='rb')
p = pyaudio.PyAudio()
stream = p.open(
format = p.get_format_from_width(file.getsampwidth()),
channels = file.getnchannels(),
rate = file.getframerate(),
output = True
)
chunk = 1024
file.rewind()
data = file.readframes(chunk)
while data:
stream.write(data)
data = file.readframes(chunk)
stream.close()
p.terminate()
#Chapter4
pl.plot(t,wv)
pl.show()
Recommended Posts