Human beings are creatures who want to make synthetic speech reading software by themselves. This is unavoidable. Pascal also says, people think of reeds. you see? (What?)

Well, for the time being, I will try an implementation that makes sounds with python with nuances like that preliminary investigation.

Chapter.0 Language / Module

--Language: Python

module --numpy (because it uses sin and π) --matplotlib (if you want to draw a waveform) --wave (for input / output of .wav file) --struct (used to binarize waveform data when converting to .wav file with wave) --pyaudio (I use it to make sounds, but Python 3.7 is awkward to install, so I don't have to use it at all)

`python`


import numpy as np
import matplotlib.pyplot as pl
import wave
import struct
import pyaudio

Jupyter notebook may make the sound a little easier (I don't know the details), but no.

Chapter1. I have to express the sound with an expression ...

Do you guys know what sound is? Sound is like a periodic (?) Change in air density. In short, it's a wave. Speaking of waves, it's sin, cos. Hooray! In conclusion, this time we will use a sine wave with the following formula. sin(2πnt/s) note_hz=n sample_hz=s

`python`


sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)

t represents the time of 1 second, and in the above case it is a one-dimensional array of 44100 elements. The information in the world we live in is continuous (analog), but unfortunately personal computers can only handle discrete (digital) data. Therefore, one second is divided into 44100 pieces for expression. サンプリング周波数.jpg (By the way, the sampling frequency of 44100hz is the standard of the sampling frequency of CD, and it is about twice the number of human audible range. Why is it doubled? Let's google with Nyquist frequency.)

The content of the sign is * 2πnt / s *. t / sample_hz * = t / s * increases to * 0,1,2, ..., 44100 * By dividing * t * by * s = 44100 *, * 0,1 / 44100 , 2/44100, ..., 440099/44100,1 *, which expresses "one second that gradually increases (1/44100 each)".

Once you ignore note_hz * = n * and look atnp.sin (2 * np.pi * t / sample_hz)* = sin (2πt / s) *, * t / s * is 0 Since it is considered to be a variable that increases from → 1 (rather than a function of time?), It can be seen that * 2πt / s * inside sin increases from 0 to 2π. In other words, * sin (2πt / s) * is a function that goes around the unit circle exactly in one second (a wave that vibrates once in one second). 正弦波.jpg Vibrating once in 1 second means that the frequency of this wave is 1 [Hz = 1 / s]. However, at frequency 1, it is inaudible. That's where note_hz * = n * comes in.

You can freely change the frequency of the wave simply by multiplying n by * 2πt / s *. For example, if * n = 440 *, * sin (2πnt / s) * becomes a wave ("la" in sound) that vibrates 440 times per second.

This completes the expression of sound in the program. I'll copy the program pasted above again.

`python`


sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)

Chapter2. Let's output the sound expressed by the program to .wav. Let's do that.

The flow from here is as follows.

Output the created sound as a .wav file.
Binary the sound data with the struct module.
Output the binary data as a .wav file with the wave module.
Play the created sound on the program. (Any)
Open the created .wav file with the wave module
Play with the pyaudio module.
Display the sound waveform as a graph with the matplotlib.pyplot module. (Any)

Regarding 3., if you don't care about the waveform, you don't have to do it at all. 2. uses a module called pyaudio, but it is troublesome to install with Python 3.7 series (if you want to install, please refer to the reference site at the end of this page), so the .wav created in 1. All you have to do is play the file with Windows Media Player.

Now, I will explain how to output as .wav.

1. Binary

It is binarized. Binaryization means converting data into binary numbers. When using the wave module, it seems that it is not possible to write to .wav files unless it is binarized. Perhaps. So let's make it binary!

Let's paste the answer first.

`python`


max_num = 32767.0 / max(wv) #Preparation for binarization
wv16 = [int(x * max_num) for x in wv] #Preparation for binarization
bi_wv = struct.pack("h" * len(wv16), *wv16) #Binary

It is like this. (Rather, it's almost like copying the site that I referred to, but is there any etiquette that prohibits copying ...? )

Let's look at the contents of [int (x * max_num) for x in wv] with wv * = W, * x * =" each of the child elements of W "= w *. In each child element w of W, x * max_num =x * 32767.0 / max(wv)

= w ・ 32767 / max (W) *
= 32767 ・ (w / max (W)) * Can be expressed as. In short, the ratio of each value w of the waveform data to the maximum value max (W) of the waveform data is taken and multiplied by 32767.

What number is 32767! I think you understand. This is because the possible values of 16-bit data (data expressed in 16-digit binary numbers) are * -32768 to 32767 *. (Because 2 to the 16th power is 65536, and half of them are 32768 ... The values that * w / max (W) * can take are * -1 to 1 *, and by multiplying it by 32767, * 32767 ・ (w / max (W)) * takes the value of * -32767 to 32767 *. The waveform data of the sound is evenly (or rather perfect?) Fitted in 16 bits. That's what you can do with wv16. Huh ...

And the binary code bi_wv = struct.pack ("h" * len (wv16), * wv16) . To be honest, I don't know anything about this. This is a copy. For the time being, the struct binary struct.pack converts it to binary format, and the first argument"h"seems to be a 2byte (16bit) integer format. Hey.

Yes, binarization is complete!

2. Output .wav file with wave module

I will paste the answer again first.

`python`


file = wave.open('sin_wave.wav', mode='wb') #sin_wave.Open wav in write mode. (If the file does not exist, create a new one.)
param = (1,2,sample_hz,len(bi_wv),'NONE','not compressed') #Parameters
file.setparams(param) #Parameter setting
file.writeframes(bi_wv) #Writing data
file.close #Close file

It is like this. Open the file with wave.open (). Specify the name of the file with the first argument, and set the write mode ('wb') or read mode ('rb') with the second argument mode =.

Set the parameters of the .wav file with wave.setparams (). The parameters (param) are in order from the left

Number of channels (stereo → 2, monaural → 1)
Sample size [byte](2 bytes this time)
Sampling frequency
Number of frames (in this case, the same as the number of t arrays)
Compressed format (only 'NONE' is supported. Does that make sense ...?)
A human-readable version of the compressed format ('not compressed' is returned for the compressed format'NONE'.)

is. Then write the binary data (bi_wv) and close the file. It's easy to forget to close the file ...

Alright, it's done! !! (Try running the file at a terminal or command prompt to see if a .wav file is generated!)

Chapter3. I want to make a sound on the program because it's annoying!

So, first open the file you created earlier with the wave module.

`python`


file = wave.open('sin_wave.wav', mode='rb')

Now you can open it. You are in read mode properly. The file part offile = wave.open ('sin_wave.wav', mode ='rb')represents a variable, so you can use a different name. fairu, wave_no_kiwami_otome, whatever. Well, I just said it. When I was a beginner, did I have to call it file? Because I misunderstood.

Then play the sound with the pyaudio module.

`python`


p = pyaudio.PyAudio() #Instantiation of pyaudio
stream = p.open(
    format = p.get_format_from_width(file.getsampwidth()),
    channels = wr.getnchannels(),
    rate = wr.getframerate(),
    output = True
    ) #Create a stream for recording and playing sound.
file.rewind() #Move the pointer back to the beginning.
chunk = 1024 #I'm not sure, but the official documentation did this.
data = file.readframes(chunk) #Read chunks (1024) of frames (sound waveform data).
while data:
    stream.write(data) #Make a sound by writing data to the stream.
    data = file.readframes(chunk) #Load a new chunk frame.
stream.close() #Close the stream.
p.terminate() #Close PyAudio.

As above, the procedure is 1.Open pyaudio, 2. Open stream, 3. Write data to stream to make sound, 4. Close stream, 5. Close pyaudio It's like that.

Chapter 4. Waveform display

Well, how long an article would be. I'm tired, Patrasche. That's why I put the code as burn.

`python`


pl.plot(t,wv)
pl.show()

How simple! matplotlib.pyplot has a lot of articles so I won't say anything in particular.

At the end

Thank you for reading this far! I did my best for the second Qiita article in my life ...

Even so, can I really make artificial speech synthesis software by myself?

Reference site / literature

-[\ Notes ] python wave module --Qiita ――It was very helpful. -[Sound programming with python](http://samuiui.com/2019/03/11/python%E3%81%A7%E9%9F%B3%E3%83%97%E3%83%AD%E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0 /) --A site that seems to use Jupyter notebook. I haven't seen much.

wave — Read and write WAV files ――The translation is subtle, but you can read it in Japanese.
PyAudio Documentation --There is only English. Let's do our best, English. -Solution when PyAudio cannot be installed with Python 3.7 --Thank you for your help when I was about to give up because I couldn't install PyAudio. A lifesaver.

Final code

`python`


import numpy as np
import matplotlib.pyplot as pl
import wave
import struct
import pyaudio

#Chapter1
sec = 1 #1 second
note_hz = 440 #La sound frequency
sample_hz = 44100 #Sampling frequency
t = np.arange(0, sample_hz * sec) #Secure an array of time for 1 second
wv = np.sin(2 * np.pi * note_hz * t/sample_hz)


#Chapter2
max_num = 32767.0 / max(wv) #Preparation for binarization
wv16 = [int(x * max_num) for x in wv] #Preparation for binarization
bi_wv = struct.pack("h" * len(wv16), *wv16) #Binary

file = wave.open('sin_wave.wav', mode='wb') #sin_wave.Open wav in write mode. (If the file does not exist, create a new one.)
param = (1,2,sample_hz,len(bi_wv),'NONE','not compressed') #Parameters
file.setparams(param) #Parameter setting
file.writeframes(bi_wv) #Writing data
file.close #Close file

#Chapter3
file = wave.open('sin_wave.wav', mode='rb')

p = pyaudio.PyAudio()
stream = p.open(
    format = p.get_format_from_width(file.getsampwidth()),
    channels = file.getnchannels(),
    rate = file.getframerate(),
    output = True
    )
chunk = 1024
file.rewind()
data = file.readframes(chunk)
while data:
    stream.write(data)
    data = file.readframes(chunk)
stream.close()
p.terminate()

#Chapter4
pl.plot(t,wv)
pl.show()

Explain in detail how to make sounds with python

Chapter.0 Language / Module

python

Chapter1. I have to express the sound with an expression ...

python

python

Chapter2. Let's output the sound expressed by the program to .wav. Let's do that.

1. Binary

python

2. Output .wav file with wave module

python

Chapter3. I want to make a sound on the program because it's annoying!

python

python

Chapter 4. Waveform display

python

At the end

Reference site / literature

Final code

python

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`