This time, we will actually use non-negative matrix factorization (NMF) for sound. The goal is to create a mock that anyone can easily perform a sound source separation demo.

The result of making

The result first ... I wrote a simple graph drawing program and tried to draw it! 代替テキスト

The demo is available as Video.

What is NMF?

I hope you can take a look at the [article] I wrote earlier (https://qiita.com/sumita_v09/items/d22850f41257d07c45ea).

How do you process sound with NMF?

The basic form of NMF is


V \approx WH

It can be expressed by an expression such as. When separating sound sources, we assume sparseness in the spectrogram matrix V and decompose it so that it becomes the product of the dictionary matrix W and the excitation matrix H. The dictionary matrix expresses the power spectrum of the timbres contained in V, and the excitation matrix expresses how many timbres expressed in W are included in the time axis. Sound source separation is possible without a teacher, but this time we want to separate arbitrary tones from the input sound, so we will use supervised NMF. Specifically, an arbitrary tone power spectrum is registered in the dictionary matrix, and only the excitation matrix is updated when updating. In addition, in order to perform sound source separation in real time, the input V is not the spectrogram matrix (frequency x time) but the power spectrum matrix (frequency x 1) of only the latest frame is V.

What kind of thing do you make?

The sound source separation mock was created with the following requirements. --Work on CUI --Sound source separation is possible in real time --You can change / add the sound source you want to detect during execution. --Sound source separation result is transmitted by OSC

As for the implementation itself, the NMF implementation described in the above link article has been modified for sound so that it can be operated on the CUI.

Source code

The implementation is not posted in the article this time, but it is published on Github. Detailed environment settings and CUI operations are described in the README, so we hope you will find it helpful. https://github.com/T-Sumida/RealTimeSoundSeprator

Summary

I created a sound source separation mock that operates in real time. It's not as insanely good as deep learning, but I think it's a benefit to be able to add sounds to separate without a learning process. NMF is also fun, but I'm thinking of writing about machine learning-related stories that I'm doing at work in the future.