Introduction

This article is for people who want to use librosa but don't know where to start. I am one of them (laughs)

Here, based on the librosa tutorial, you can share your own interpretation and understanding of what you are doing in the tutorial. I think it's a tutorial. Specifically, while commenting on the original tutorial in Japanese, I will explain in the form of supplementing the behavior by showing the contents and figures of variables. However, since I am a beginner myself, I haven't written much difficult things.

This is the first article, so please take a warm look. If you have any mistakes, please feel free to point them out!

This time's point

--Loading sample sound source --Get beat information

Click here for how to install librosa (https://librosa.github.io/librosa/install.html)!

This time is a quick start of the librosa tutorial. First of all, it feels like the basics. I want to get used to librosa little by little!

Loading sample sound source

First, let's get the path (location) of the sample sound source.

#Import librosa
import librosa
#Get the path of the sample ogg file
filename = librosa.util.example_audio_file()
print(filename) # C:~\Python\Python36\site-packages\librosa\util\example_data\Kevin_MacLeod_-_Vibe_Ace.ogg

Here, the Kevin_MacLeod_-_Vibe_Ace.ogg file is loaded as a sample sound source. You can load any sound source by changing this filename. By the way, as the file name suggests, this song is Vibe Ace from Kevin Macleod. [Google Play Music](https://www.google.com/search?sa=X&biw=1280&bih=667&sxsrf=ALeKk00C4J7UUWZwKvPxMEnRFVkx_rSMVQ:1590896525499&q=%E3%82%B1%E3%83%93%E3%83%B According to the E3% 83% BB% E3% 83% 9E% E3% 82% AF% E3% 83% AD% E3% 83% BC% E3% 83% 89 + vibe + ace & stick = H4sIAAAAAAAAAONgFuLVT9c3NEwqN7IwME_OUYJwk7PzUopMstK0RLOTrfRzS4szk_UTc5JKc62K8_PSixexqjxu2vi4efLj5s2Pm3c_bp73uGn94-a1j5v3PG7uVCjLTEpVSExO3cHKCACxHq1VYQAAAA & ved = 2ahUKEwiq_5qyl93pAhVGHaYKHQnECMkQri4wKHoECBUQUw), song The length of is as short as 1:05, and the genre seems to be Jazz.

Also, according to Wikipedia, what is an ogg file?

Ogg is a container that stores one or more codecs as its contents. The most typical Ogg codec is the voice codec Vorbis. The Ogg that stores Vorbis is called Ogg Vorbis (as well as other codecs). Ogg Vorbis is sometimes referred to simply as Ogg, but it should be noted that Ogg is the name of the container, not the codec. (Omitted) Initially, the Xiph.Org Foundation defined the common extension for Ogg as .ogg, but in 2007 changed the common extension to .ogx, the video extension to .ogv, and the audio extension to .oga. .. The original common extension, .ogg, is used for compatibility purposes only with Ogg Vorbis audio files.

It seems that the contents are a file format called Vorbis.

Let's load it now.

#Load using the previous path
#y: Waveform
#sr: sampling rate
y, sr = librosa.load(filename)
print(type(y)) # <class 'numpy.ndarray'>
print(y.shape) # (1355168,)
print(type(sr), sr) # <class 'int'> 22050

librosa.load --Input: Audio file path filename --Output: Audio waveform y, sampling rate sr

This is the method used to read the audio file It seems to support most audio file formats such as wav, flac, and aiff.

The outputs y and sr are represented by numpy arrays and ints, respectively. From print (y.shape), you can see that y contains 1355168 numbers. Also, sr is 22050Hz by default, which means that 22050 numbers correspond to 1 second.

Get beat information

Next is the acquisition of beat information.

What is important here is a ** frame ** that summarizes the numerical values of some of the waveforms (the number defined by hop_length). By default, hop_length = 512, so the number is 512 → 1 frame. This time, we will get the frame that is the beat (RBI).

# tempo：BPM
# beat_frames: Beat timing frames
#512 samples per frame(hop_length=512)
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print('Estimated tempo: {:.2f} beats per minute'.format(tempo)) # Estimated tempo: 129.20 beats per minute
print(beat_frames) 
# [   5   24   43   63   83  103  122  142  162  182  202  222  242  262
#   281  301  321  341  361  382  401  421  441  461  480  500  520  540
#   560  580  600  620  639  658  678  698  718  737  758  777  798  817
#   837  857  877  896  917  936  957  976  996 1016 1036 1055 1075 1095
#  1116 1135 1155 1175 1195 1214 1234 1254 1275 1295 1315 1334 1354 1373
#  1394 1414 1434 1453 1473 1493 1513 1532 1553 1573 1593 1612 1632 1652
#  1672 1691 1712 1732 1752 1771 1791 1811 1831 1850 1871 1890 1911 1931
#  1951 1971 1990 2010 2030 2050 2070 2090 2110 2130 2150 2170 2190 2209
#  2229 2249 2269 2289 2309 2328 2348 2368 2388 2408 2428 2448 2468 2488
#  2508 2527 2547]

librosa.beat.beat_track --Input: Waveform y, sampling rate sr --Output: BPM tempo, a list of indexes of the frames that are beats beat_frames

You can get the BPM of music from this method. In this case, BPM = 129.20, so it shows that there are 129.2 beats per minute. Also, from print (beat_frames), you can see that the beat comes in the 5th frame, 24th frame, .... It seems that the beat comes every 20 frames.

Next, let's look at the beat timing in time.

# beat_frames->beat_times
#Use when you want to know the beat timing by time
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
#The calculation formula is as follows
# beat_times[i] = beat_frames[i] * hop_length / sr
print(beat_times)
#     [ 0.11609977  0.55727891  0.99845805  1.46285714  1.92725624  2.39165533
#   2.83283447  3.29723356  3.76163265  4.22603175  4.69043084  5.15482993
#   5.61922902  6.08362812  6.52480726  6.98920635  7.45360544  7.91800454
#   8.38240363  8.87002268  9.31120181  9.77560091 10.24       10.70439909
#  11.14557823 11.60997732 12.07437642 12.53877551 13.0031746  13.4675737
#  13.93197279 14.39637188 14.83755102 15.27873016 15.74312925 16.20752834
#  16.67192744 17.11310658 17.60072562 18.04190476 18.52952381 18.97070295
#  19.43510204 19.89950113 20.36390023 20.80507937 21.29269841 21.73387755
#  22.2214966  22.66267574 23.12707483 23.59147392 24.05587302 24.49705215
#  24.96145125 25.42585034 25.91346939 26.35464853 26.81904762 27.28344671
#  27.7478458  28.18902494 28.65342404 29.11782313 29.60544218 30.06984127
#  30.53424036 30.9754195  31.43981859 31.88099773 32.36861678 32.83301587
#  33.29741497 33.7385941  34.2029932  34.66739229 35.13179138 35.57297052
#  36.06058957 36.52498866 36.98938776 37.43056689 37.89496599 38.35936508
#  38.82376417 39.26494331 39.75256236 40.21696145 40.68136054 41.12253968
#  41.58693878 42.05133787 42.51573696 42.9569161  43.44453515 43.88571429
#  44.37333333 44.83773243 45.30213152 45.76653061 46.20770975 46.67210884
#  47.13650794 47.60090703 48.06530612 48.52970522 48.99410431 49.4585034
#  49.92290249 50.38730159 50.85170068 51.29287982 51.75727891 52.221678
#  52.6860771  53.15047619 53.61487528 54.05605442 54.52045351 54.98485261
#  55.4492517  55.91365079 56.37804989 56.84244898 57.30684807 57.77124717
#  58.23564626 58.6768254  59.14122449]

librosa.frames_to_time --Input: A list of indexes of frames that are beats beat_frames, sampling rate sr --Output: A list of beat timings (seconds) beat_times

The following calculations are performed here.

\mathrm{beat\_times[i]=beat\_frames[i] \times hop\_length / sr}\\

For example, in the case of $ \ mathrm {i = 0} $

\begin{align}
\mathrm{beat\_times}[0]&=\mathrm{beat\_frames[0]} \times \mathrm{hop\_length} / \mathrm{sr}\\
&=5 \times 512 / 22050\\
&=0.1160997732...\\
&\simeq0.11609977
\end{align}

From this, we can see that beat_times can be calculated from beat_frames and sr.

in conclusion

The first time was the tutorial Quickstart, how was it? We would appreciate it if you could comment on the article as well as the content.

Next time, the tutorial Advanced usage, then Advanced examples I hope I can go to .html # advanced).

Learn librosa with a tutorial 1

Introduction

This time's point

Loading sample sound source

Get beat information

in conclusion