Currently, there seems to be no free software on mac that can use the audio spectrum (waveform in the frequency domain that moves slimy according to common sounds). So, let's make our own using python and play with it.
(For windows, it seems that you can do it with free software called AviUtl.)
I have a ** wav format ** file that I want to create an audio waveform. (In my case, it's the output file of the song I typed in GarageBand.) I'd like to convert this to a video format, but it's a bit dull for a video that only has audio flowing in a still image.
Therefore, the purpose of this time is to create an ** audio spectrum ** that moves according to the song you made so that it will look more or less video.
You can make something like this ↓ https://www.youtube.com/watch?v=JPE54SlF6H0
OS:macOS High Sierra 10.13.6 Language: Python 3.7.4
Other than the standard library,
-PyGame (It's a game engine, but it's just like a GUI for display) -PyAudio (used to play wav files) -PySoundFile (used to read wav file data) -** SciPy ** (Used for fast Fourier transform ← Is it faster than NumPy?)
Requires installation. Basically, I think pip (pip3) is OK. I will write it so that even those who only understand NumPy can read it (because I am).
This is a sample code that only moves light blue waves on a pink background.
If you prepare a sound source named sample.wav
in the same layer as the program, you can move it for the time being. Monaural stereo is quite crude, but both are supported. [Free sound source like this](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83%B3%E3%83%88%E3%81 % AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3% E3% 83% 81% E3% 83% A3 % E3% 83% B3% EF% BC% 89 & janl =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr =% E3% 83% 90% E3% 83% A9 % E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) You can also play.
After posting the whole thing, I would like to take a closer look.
SampleAudioVisualizer.py
#!/usr/bin/env python3
import wave
import sys
import pygame
from pygame.locals import *
import scipy.fftpack as spfft
import soundfile as sf
import pyaudio
import numpy as np
# --------------------------------------------------------------------
#Parameters
# --------------------
fn = "sample.wav"
#for calculation
CHUNK = 1024 #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0 #Sampling start position
N = 1024 #Number of FFT samples
SHIFT = 1024 #Number of samples to shift the window function
hammingWindow = np.hamming(N) #Window function
# --------------------
#For drawing
SCREEN_SIZE = (854, 480) #Display size
rectangle_list = []
# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")
# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call
def play_wav_file(filename):
try:
wf = wave.open(filename, "r")
except FileNotFoundError: #If the file does not exist
print("[Error 404] No such file or directory: " + filename)
return 0
#Open stream
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
#Play audio
data = wf.readframes(CHUNK)
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
redraw()
stream.close()
p.terminate()
# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
global start
global screen
global rectangle_list
# --------------------
#Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
windowedData = hammingWindow * x[start:start + N] #Data block with window function
X = spfft.fft(windowedData) # FFT
amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
for c in X] #Amplitude spectrum
# --------------------
#Drawing in Pygame
screen.fill((240, 128, 128)) #Initialize with your favorite color
rectangle_list.clear() #Rectangle list initialization
#Spectral drawing While executing and adjusting numerical values
for i in range(86):
rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
(1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))
pygame.display.update(rectangle_list) #Display update
start += SHIFT #Shift the range to apply the window function
if start + N > nframes:
sys.exit()
for event in pygame.event.get(): #End processing
if event.type == QUIT:
sys.exit()
if event.type == KEYDOWN:
if event.key == K_ESCAPE:
sys.exit()
# --------------------------------------------------------------------
if __name__ == "__main__":
# --------------------
#Get wav data
data, fs = sf.read(fn) #The shape of data is(Number of frames x number of channels)
if data.ndim == 1:
x = data #If it is monaural, use it as it is
if data.ndim == 2:
x = data[:, 0] #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)
nframes = x.size #Get the number of frames(Used as an end condition when shifting the window function in FFT)
# --------------------
#Start playback and drawing
play_wav_file(fn)
# --------------------------------------------------------------------
The data part in wav format is time series data that holds sound information for each ** 1 / fs ** seconds (fs: sampling frequency [Hz]).
(Addition) [Free sound source](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83] % B3% E3% 83% 88% E3% 81% AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3 % E3% 83% 81% E3% 83% A3% E3% 83% B3% EF% BC% 89 & jarn =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr = % E3% 83% 90% E3% 83% A9% E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) Let's plot the data of. (Since this sound source is stereo, I will take only the L channel) Like this, you can see that this data (array) contains waves that take values from -1 to +1. The horizontal axis is the index of the array. Since the information of
1 / fs
seconds (by the way,fs = 44.1 [kHz]
in this example) is expressed for each element, it is the" waveform seen on the time axis ". ..
It may be easier to say that you can convert to seconds by multiplying the horizontal axis of this graph by
1/44100
.
The audio spectrum, on the other hand, is a constantly changing graph in the frequency domain. ** Data in the time domain can be viewed in the frequency domain by Fourier transform **, so it seems that we will proceed while using the Fourier transform well.
Therefore,
It seems that it would be good to perform the processing. It is an image that frequently repeats audio reproduction and Fourier transform in real time.
By the way, I am trying to process wav data points by shifting 1024 by 1024 as "short-time data", but it does not have to be 1024 separately. However, if you make it too small, it will take longer to draw than to play it, so the behavior will be strange.
This part of the main routine.
Excerpt
import soundfile as sf
fn = "sample.wav"
# (Abbreviation)
# --------------------------------------------------------------------
if __name__ == "__main__":
# --------------------
#Get wav data
data, fs = sf.read(fn) #The shape of data is(Number of frames x number of channels)
if data.ndim == 1:
x = data #If it is monaural, use it as it is
if data.ndim == 2:
x = data[:, 0] #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)
nframes = x.size #Get the number of frames(It is used as an end condition when shifting the window function in the FFT described later.)
# --------------------
# (Abbreviation)
You can use PySoundFile to handle wav files nicely. I was able to get the data and its length using the read ()
method.
(Reference: Wav file operation in Python)
Define a function called play_wav_file ()
that writes to a stream and plays audio in units of CHUNK
. The module uses wave and PyAudio.
(Reference: [Python] Play wav files with Pyaudio)
Basically, it's the same as the article I referred to, but I put a self-made function called redraw ()
in the loop process of writing to a stream and reading the next data. (To display the audio spectrum at the same time as playback)
Excerpt
import wave
import pyaudio
# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024 #Output to stream in chunks with pyaudio(I don't know why 1024)
# ~Omission~
# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call
def play_wav_file(filename):
try:
wf = wave.open(filename, "r")
except FileNotFoundError: #If the file does not exist
print("[Error 404] No such file or directory: " + filename)
return 0
#Open stream
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
#Play audio
data = wf.readframes(CHUNK)
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
redraw() #It is a function for redrawing. I will make it later.
stream.close()
p.terminate()
# ~Omission~
This article (Short-Time Fourier Transform-A Breakthrough on Artificial Intelligence) is very easy to understand and was helpful.
Since we set CHUNK = 1024
at the time of audio playback earlier, we also set the number of target samples N to which the fast Fourier transform (hereinafter referred to as FFT) is applied to 1024.
After extracting 1024 data from the whole data, do not perform FFT as it is, but apply ** window function ** and then perform FFT. It has become a theoretical story, but the page called Reason for using window functions-Logical Arts Institute, which was introduced in the previous article. It is organized in an easy-to-understand manner, so please have a look if you are interested.
Here we use the major ** humming window ** (np.hamming ()
). By applying this, the edges are connected smoothly, and the cut out sample becomes a periodic function.
Excerpt
import sys
import scipy.fftpack as spfft
import numpy as np
# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024 #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0 #Sampling start position
N = 1024 #Number of FFT samples
SHIFT = 1024 #Number of samples to shift the window function
hammingWindow = np.hamming(N) #Window function
# ~Omission~
# --------------------------------------------------------------------
#Repeat "Drawing with FFT". Here, we will only look at the process of applying FFT.
def redraw():
global start
# ~Omission~
# --------------------
#Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
windowedData = hammingWindow * x[start:start + N] #Data block with window function
# (↑ list x[]Is this article 3-It is the wav data extracted in 1.)
X = spfft.fft(windowedData) # FFT
amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
for c in X] #Amplitude spectrum
# --------------------
#Drawing process in Pygame here(Omitted here)
start += SHIFT #Shift the range to apply the window function
if start + N > nframes:
sys.exit() #Go to the end of the wav file and exit when the window function can no longer be applied
#Here are the end conditions for PyGame(Omitted here)
# --------------------------------------------------------------------
# ~Omission~
What you are doing is simple, sample N
data, apply a window function to perform FFT, calculate the amplitude spectrum, shift the sampling target by SHIFT
, and prepare for the next call. I will. All you have to do now is draw the calculated amplitude spectrum using ** PyGame **.
I will play with this article (Visualizer for beginners in Python).
Excerpt
import pygame
from pygame.locals import *
# --------------------------------------------------------------------
#Parameters
# --------------------
# ~Omission~
# --------------------
#For drawing
SCREEN_SIZE = (854, 480) #Display size
rectangle_list = []
# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")
# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
# ~Omission~
global screen
global rectangle_list
# --------------------
#Amplitude spectrum by applying FFT to the block of target sample points(amplitudeSpectrum)Processing to calculate(abridgement)
# --------------------
#Drawing in Pygame
screen.fill((240, 128, 128)) #Initialize with your favorite color
rectangle_list.clear() #Rectangle list initialization
#Spectral drawing While executing and adjusting numerical values
for i in range(86):
rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
(1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))
pygame.display.update(rectangle_list) #Display update
# ~Omission~
for event in pygame.event.get(): #End processing
if event.type == QUIT:
sys.exit()
if event.type == KEYDOWN:
if event.key == K_ESCAPE:
sys.exit()
# --------------------------------------------------------------------
# ~Omission~
The problem is how to display the waves, but if you use pygame.draw.line
, for example, it seems that you can express the waves with multiple straight lines in the same way as a histogram. I think that arrangements will work as much as you like around here. PyGame's methods are organized here [http://westplain.sakuraweb.com/translate/pygame/Display.cgi). It seems that pygame.draw.line
is used like this.
pygame.draw.line Draw a straight line segment.
pygame.draw.line(Surface, color, start_pos, end_pos, width=1): return Rect Draw a straight line segment on your Surface. There is no special decoration at both ends of the line, and it becomes a square shape that matches the thickness of the line.
As an example of the drawing flow, determine the size of the PyGame window in advance, initialize it, and then
pygame.Rect
) based on the calculated amplitude spectrum and keep it in the list.pygame.display.update ()
Is it like that? Let's also prepare the termination process when the PyGame window is erased with the × button or the esc key is pressed.
(By the way, the display size is set to 854 * 480 to match the aspect ratio of youtube, and the for loop range is set to 86 when the interval of the quadrangle (straight line) representing the wave created this time is 87. It's because it goes off the screen after the eyes. The description here is not very smart ... I'm sorry. If you play while changing the numbers appropriately, I think that you can grasp the behavior somehow. )
In the sample code, only the waves are moving in the background color, but you can also put images of characters and logos like the opening gif. (Reference: Introduction to Pygame with Python 3: Chapter 1) It's easy if you do Surface.blit ()
in redraw ()
I think it can be implemented.
Also, this time I worked hard so far and made a video by recording the screen that was created, but it seems that some people are doing things like writing the screen of PyGame to a video. [PyGame] AVI export & screenshot of screen
I've only touched Python in a university class to the extent of playing with sample code, but it's interesting because there are various useful libraries. There may have been many places where I didn't know how to do it, but I hope I can study it little by little.
Thank you for your hard work!
Recommended Posts