Introduction

Currently, there seems to be no free software on mac that can use the audio spectrum (waveform in the frequency domain that moves slimy according to common sounds). So, let's make our own using python and play with it.

(For windows, it seems that you can do it with free software called AviUtl.)

Situation and purpose

I have a ** wav format ** file that I want to create an audio waveform. (In my case, it's the output file of the song I typed in GarageBand.) I'd like to convert this to a video format, but it's a bit dull for a video that only has audio flowing in a still image.

Therefore, the purpose of this time is to create an ** audio spectrum ** that moves according to the song you made so that it will look more or less video.

You can make something like this ↓ https://www.youtube.com/watch?v=JPE54SlF6H0

1. About the environment

OS：macOS High Sierra 10.13.6 Language: Python 3.7.4

Other than the standard library,

-PyGame (It's a game engine, but it's just like a GUI for display) -PyAudio (used to play wav files) -PySoundFile (used to read wav file data) -** SciPy ** (Used for fast Fourier transform ← Is it faster than NumPy?)

NumPy

Requires installation. Basically, I think pip (pip3) is OK. I will write it so that even those who only understand NumPy can read it (because I am).

2. Sample code

This is a sample code that only moves light blue waves on a pink background. If you prepare a sound source named sample.wav in the same layer as the program, you can move it for the time being. Monaural stereo is quite crude, but both are supported. [Free sound source like this](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83%B3%E3%83%88%E3%81 % AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3% E3% 83% 81% E3% 83% A3 % E3% 83% B3% EF% BC% 89 & janl =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr =% E3% 83% 90% E3% 83% A9 % E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) You can also play.

After posting the whole thing, I would like to take a closer look.

`SampleAudioVisualizer.py`


#!/usr/bin/env python3
import wave
import sys
import pygame
from pygame.locals import *
import scipy.fftpack as spfft
import soundfile as sf
import pyaudio
import numpy as np

# --------------------------------------------------------------------
#Parameters
# --------------------
fn = "sample.wav"
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0  #Sampling start position
N = 1024  #Number of FFT samples
SHIFT = 1024  #Number of samples to shift the window function
hammingWindow = np.hamming(N)  #Window function

# --------------------
#For drawing
SCREEN_SIZE = (854, 480)  #Display size
rectangle_list = []

# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")

# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call
def play_wav_file(filename):
    try:
        wf = wave.open(filename, "r")
    except FileNotFoundError:  #If the file does not exist
        print("[Error 404] No such file or directory: " + filename)
        return 0

    #Open stream
    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True)

    #Play audio
    data = wf.readframes(CHUNK)
    while data != '':
        stream.write(data)
        data = wf.readframes(CHUNK)
        redraw()
    stream.close()
    p.terminate()

# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
    global start
    global screen
    global rectangle_list

    # --------------------
    #Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
    windowedData = hammingWindow * x[start:start + N]  #Data block with window function
    X = spfft.fft(windowedData)  # FFT
    amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
                         for c in X]  #Amplitude spectrum

    # --------------------
    #Drawing in Pygame

    screen.fill((240, 128, 128))  #Initialize with your favorite color
    rectangle_list.clear()  #Rectangle list initialization
    #Spectral drawing While executing and adjusting numerical values
    for i in range(86):
        rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
                                               (1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))

    pygame.display.update(rectangle_list)  #Display update

    start += SHIFT  #Shift the range to apply the window function
    if start + N > nframes:
        sys.exit()

    for event in pygame.event.get():  #End processing
        if event.type == QUIT:
            sys.exit()
        if event.type == KEYDOWN:
            if event.key == K_ESCAPE:
                sys.exit()

# --------------------------------------------------------------------
if __name__ == "__main__":

    # --------------------
    #Get wav data
    data, fs = sf.read(fn)  #The shape of data is(Number of frames x number of channels)
    if data.ndim == 1:
        x = data  #If it is monaural, use it as it is
    if data.ndim == 2:
        x = data[:, 0]  #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)

    nframes = x.size  #Get the number of frames(Used as an end condition when shifting the window function in FFT)

    # --------------------
    #Start playback and drawing
    play_wav_file(fn)
# --------------------------------------------------------------------

3. Implementation flow

The data part in wav format is time series data that holds sound information for each ** 1 / fs ** seconds (fs: sampling frequency [Hz]).

(Addition) [Free sound source](https://on-jin.com/sound/listshow.php?pagename=ta&title=%E3%82%B3%E3%83] % B3% E3% 83% 88% E3% 81% AE% E3% 82% AA% E3% 83% 8102% EF% BC% 88% E3% 83% 81% E3% 83% A3% E3% 83% B3 % E3% 83% 81% E3% 83% A3% E3% 83% B3% EF% BC% 89 & jarn =% E3% 81% 9D% E3% 81% AE% E4% BB% 96% E9% 9F% B3 & bunr = % E3% 83% 90% E3% 83% A9% E3% 82% A8% E3% 83% 86% E3% 82% A3 & kate =% E3% 81% 9D% E3% 81% AE% E4% BB% 96) Let's plot the data of. (Since this sound source is stereo, I will take only the L channel) Like this, you can see that this data (array) contains waves that take values from -1 to +1. The horizontal axis is the index of the array. Since the information of 1 / fs seconds (by the way,fs = 44.1 [kHz]in this example) is expressed for each element, it is the" waveform seen on the time axis ". ..

It may be easier to say that you can convert to seconds by multiplying the horizontal axis of this graph by 1/44100.

The audio spectrum, on the other hand, is a constantly changing graph in the frequency domain. ** Data in the time domain can be viewed in the frequency domain by Fourier transform **, so it seems that we will proceed while using the Fourier transform well.

Therefore,

Read ultra-short data ... (index 0 to 1023)
While playing the audio using ** PyAudio ** ...
Fast Fourier Transform (** FFT **) & Draw the transformed spectrum with ** PyGame ** ...
Also read the next short data ... (indexes 1024 to 2047)
- (Repeat this until the data is finished) *

It seems that it would be good to perform the processing. It is an image that frequently repeats audio reproduction and Fourier transform in real time.

By the way, I am trying to process wav data points by shifting 1024 by 1024 as "short-time data", but it does not have to be 1024 separately. However, if you make it too small, it will take longer to draw than to play it, so the behavior will be strange.

3-1. Get the time series data of wav file and its length (number of frames)

This part of the main routine.

`Excerpt`


import soundfile as sf
fn = "sample.wav"
# (Abbreviation)
# --------------------------------------------------------------------
if __name__ == "__main__":

    # --------------------
    #Get wav data
    data, fs = sf.read(fn)  #The shape of data is(Number of frames x number of channels)
    if data.ndim == 1:
        x = data  #If it is monaural, use it as it is
    if data.ndim == 2:
        x = data[:, 0]  #If it's stereo, I decided to focus on the L channel only.(For R, change 0 to 1.)

    nframes = x.size  #Get the number of frames(It is used as an end condition when shifting the window function in the FFT described later.)

    # --------------------
    # (Abbreviation)

You can use PySoundFile to handle wav files nicely. I was able to get the data and its length using the read () method. (Reference: Wav file operation in Python)

3-2. Wav file playback process

Define a function called play_wav_file () that writes to a stream and plays audio in units of CHUNK. The module uses wave and PyAudio.

(Reference: [Python] Play wav files with Pyaudio)

Basically, it's the same as the article I referred to, but I put a self-made function called redraw () in the loop process of writing to a stream and reading the next data. (To display the audio spectrum at the same time as playback)

`Excerpt`


import wave
import pyaudio

# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)

# ~Omission~

# --------------------------------------------------------------------
#Redraw function redraw defined later while playing wav file()Function to call

def play_wav_file(filename):
    try:
        wf = wave.open(filename, "r")
    except FileNotFoundError:  #If the file does not exist
        print("[Error 404] No such file or directory: " + filename)
        return 0

    #Open stream
    p = pyaudio.PyAudio()
    stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
                    channels=wf.getnchannels(),
                    rate=wf.getframerate(),
                    output=True)

    #Play audio
    data = wf.readframes(CHUNK)
    while data != '':
        stream.write(data)
        data = wf.readframes(CHUNK)
        redraw() #It is a function for redrawing. I will make it later.
    stream.close()
    p.terminate()

# ~Omission~

3-3. Apply FFT to the block of target sample points

This article (Short-Time Fourier Transform-A Breakthrough on Artificial Intelligence) is very easy to understand and was helpful.

Since we set CHUNK = 1024 at the time of audio playback earlier, we also set the number of target samples N to which the fast Fourier transform (hereinafter referred to as FFT) is applied to 1024.

After extracting 1024 data from the whole data, do not perform FFT as it is, but apply ** window function ** and then perform FFT. It has become a theoretical story, but the page called Reason for using window functions-Logical Arts Institute, which was introduced in the previous article. It is organized in an easy-to-understand manner, so please have a look if you are interested.

Here we use the major ** humming window ** (np.hamming ()). By applying this, the edges are connected smoothly, and the cut out sample becomes a periodic function. 1024px-Window_function_(hamming).svg.png

`Excerpt`


import sys
import scipy.fftpack as spfft
import numpy as np

# --------------------------------------------------------------------
#Parameters
# --------------------
#for calculation
CHUNK = 1024  #Output to stream in chunks with pyaudio(I don't know why 1024)
start = 0  #Sampling start position
N = 1024  #Number of FFT samples
SHIFT = 1024  #Number of samples to shift the window function
hammingWindow = np.hamming(N)  #Window function

# ~Omission~

# --------------------------------------------------------------------
#Repeat "Drawing with FFT". Here, we will only look at the process of applying FFT.
def redraw():
    global start
    # ~Omission~

    # --------------------
    #Calculate the amplitude spectrum by applying FFT to the block of the target sample point.
    windowedData = hammingWindow * x[start:start + N]  #Data block with window function
    # (↑ list x[]Is this article 3-It is the wav data extracted in 1.)
    X = spfft.fft(windowedData)  # FFT
    amplitudeSpectrum = [np.sqrt(c.real ** 2 + c.imag ** 2)
                         for c in X]  #Amplitude spectrum

    # --------------------
    #Drawing process in Pygame here(Omitted here)

    start += SHIFT  #Shift the range to apply the window function
    if start + N > nframes:
        sys.exit() #Go to the end of the wav file and exit when the window function can no longer be applied

    #Here are the end conditions for PyGame(Omitted here)

# --------------------------------------------------------------------
# ~Omission~

What you are doing is simple, sample N data, apply a window function to perform FFT, calculate the amplitude spectrum, shift the sampling target by SHIFT, and prepare for the next call. I will. All you have to do now is draw the calculated amplitude spectrum using ** PyGame **.

3-4. Drawing using PyGame

I will play with this article (Visualizer for beginners in Python).

`Excerpt`


import pygame
from pygame.locals import *
# --------------------------------------------------------------------
#Parameters
# --------------------
# ~Omission~
# --------------------
#For drawing
SCREEN_SIZE = (854, 480)  #Display size
rectangle_list = []

# --------------------
#pygame screen initial settings
pygame.init()
screen = pygame.display.set_mode(SCREEN_SIZE)
pygame.display.set_caption("Pygame Audio Visualizer")
# --------------------------------------------------------------------
#Repeat "Drawing with FFT".
def redraw():
    # ~Omission~
    global screen
    global rectangle_list

    # --------------------
    #Amplitude spectrum by applying FFT to the block of target sample points(amplitudeSpectrum)Processing to calculate(abridgement)
    # --------------------
    #Drawing in Pygame

    screen.fill((240, 128, 128))  #Initialize with your favorite color
    rectangle_list.clear()  #Rectangle list initialization
    #Spectral drawing While executing and adjusting numerical values
    for i in range(86):
        rectangle_list.append(pygame.draw.line(screen, (102, 205, 170), (1+i * 10, 350 + amplitudeSpectrum[i] * 1),
                                               (1+i * 10, 350 - amplitudeSpectrum[i] * 1), 4))

    pygame.display.update(rectangle_list)  #Display update

    # ~Omission~

    for event in pygame.event.get():  #End processing
        if event.type == QUIT:
            sys.exit()
        if event.type == KEYDOWN:
            if event.key == K_ESCAPE:
                sys.exit()
# --------------------------------------------------------------------
# ~Omission~

The problem is how to display the waves, but if you use pygame.draw.line, for example, it seems that you can express the waves with multiple straight lines in the same way as a histogram. I think that arrangements will work as much as you like around here. PyGame's methods are organized here [http://westplain.sakuraweb.com/translate/pygame/Display.cgi). It seems that pygame.draw.line is used like this.

pygame.draw.line Draw a straight line segment.

pygame.draw.line(Surface, color, start_pos, end_pos, width=1): return Rect Draw a straight line segment on your Surface. There is no special decoration at both ends of the line, and it becomes a square shape that matches the thickness of the line.

As an example of the drawing flow, determine the size of the PyGame window in advance, initialize it, and then

Decide the background color and initialize the PyGame screen (colors can be seen on such sites / WEB color sample list)
Create a straight line object (pygame.Rect) based on the calculated amplitude spectrum and keep it in the list.
Screen update (= drawing) with pygame.display.update ()

Is it like that? Let's also prepare the termination process when the PyGame window is erased with the × button or the esc key is pressed.

(By the way, the display size is set to 854 * 480 to match the aspect ratio of youtube, and the for loop range is set to 86 when the interval of the quadrangle (straight line) representing the wave created this time is 87. It's because it goes off the screen after the eyes. The description here is not very smart ... I'm sorry. If you play while changing the numbers appropriately, I think that you can grasp the behavior somehow. )

4. Other

In the sample code, only the waves are moving in the background color, but you can also put images of characters and logos like the opening gif. (Reference: Introduction to Pygame with Python 3: Chapter 1) It's easy if you do Surface.blit () in redraw () I think it can be implemented.

Also, this time I worked hard so far and made a video by recording the screen that was created, but it seems that some people are doing things like writing the screen of PyGame to a video. [PyGame] AVI export & screenshot of screen

5. Impressions

I've only touched Python in a university class to the extent of playing with sample code, but it's interesting because there are various useful libraries. There may have been many places where I didn't know how to do it, but I hope I can study it little by little.

Thank you for your hard work!

Try to create a waveform (audio spectrum) that moves according to the sound with python

Introduction

Situation and purpose

1. About the environment

2. Sample code

SampleAudioVisualizer.py

3. Implementation flow

3-1. Get the time series data of wav file and its length (number of frames)

Excerpt

3-2. Wav file playback process

Excerpt

3-3. Apply FFT to the block of target sample points

Excerpt

3-4. Drawing using PyGame

Excerpt

4. Other

5. Impressions

`SampleAudioVisualizer.py`

`Excerpt`

`Excerpt`

`Excerpt`

`Excerpt`