When I tried to actually create a program that inputs voice and outputs it with some sort of filter in Python, I summarized it because I / O was covered. Write based on the file standard of a general audio file (sampling rate 44100Hz, quantization bit size 16bit).

Voice input

Input / output uses a wave file and imports a wave package.

`input`


import numpy as np
import wave

name = "filename.wav"
wavefile = wave.open(name, "r") 
framerate = wavefile.getframerate()
data = wavefile.readframes(wavefile.getnframes()) 
x = np.frombuffer(data, dtype="int16")

The information required when inputting audio from a wave file is the value for each sampling. .getnframes () gets the total number of samples. That is, the length of the data. Gets the data for the length specified by .readframes ().

However, since it is acquired in binary notation as it is, it is converted to int type with frombuffer (). Now you can work with values in a numpy array. The value at this time is int16 and takes a value of -32768 to 32767, so adjust it by dividing it when processing audio.

Also, .getframerate () gets the sampling rate. I think that the sample rate of the file is necessary for voice processing, so I have included it.

After that, it's boiled or baked ...

Audio output

I just do the opposite of the input. .. ..

`output`


w = wave.Wave_write("output.wav")
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(44100)
w.writeframes(x)
w.close()

Since the wave file needs header information, .setnchannels() .setsampwidth() .setframerate() Seems to be at least necessary. .setsampwidth () is the sample (quantization) size, but 16bit is 2 (bytes).

Also, in the case of .writeframes (), nframes are automatically modified by the number of data, so there is no problem even if you do not set it, but in the case of .writeframesraw (), nframes is not modified but the output audio is data properly. It will be played for a minute. (Is the setting of nframes meaningless?)

If you execute it, the wave file will be output.

** ・ Use of pyaudio ** The pyaudio package doesn't seem to be included by default so I haven't tried it yet. It is convenient for stream playback / recording, and pyaudio may be better for handling dynamic files.

Summary

If you want to handle audio data as a decimal data array, you can get it by this method. In this case, it is assumed that the sample bit size is 16 bits, which is common in monaural, so when inputting special audio such as 8 bits and 24 bits, it seems necessary to convert with that type size at the time of binary conversion. is. Regarding the output, I thought that I could not do it without binary conversion, but it seems that it is OK just to pass int type array data. However, when the data array type is 16bit, it seems to be useless unless it is int16.

reference

21.5. Wave — Reading and writing WAV files — Python 2.7.x documentation Wave file generation with Python | Weboo! Returns. Sound spectrogram in Python-A breakthrough in artificial intelligence [Python] Waveform display (monaural / stereo) [Sound programming] --Slightly mysterious.

Python audio input / output