The Spotify API returns the analyzed musical parameters for each song. Get a list of songs that include those data and analyze the trends of the songs you listened to in 2020. (It's new year ...)
--Spotify Playlist CSV-See Part 1 Article --Installation of Exploratory- Install for free I can do it
Exploratory data analysis. Abbreviation for Explanatory Data Analysis. EDA is the very first phase of data analysis, with the goal of first touching the data, visualizing the data, looking for patterns, and understanding the relationships/correlations of features and targets.
Before starting the analysis, it is important to first understand "what kind of data set you are dealing with".
Feature engineering is often required to build more advanced machine learning models and solve difficult problems, which requires deep data knowledge and understanding. Also, know which columns need to be preprocessed at this stage.
You can use pandas for the next step, but if you're a python beginner, you may get stuck writing code. First of all, I think it would be nice if you could get an image "from the visual", so I will introduce a method that can be done without code.
The tool to use is Exploratory.
Simply import the CSV data and the summary statistics for each column will be displayed, including the presence or absence of missing values for each item, as shown below. It's convenient!
Correlation of column values can also be visualized by GUI operation. You can see the positive correlation between loudness and energy.
The reasons why pretreatment is necessary are as follows. --Because the machine learning model needs to be passed as numerical data instead of string data --Similar to the above, data with missing values (null) cannot be passed to the machine learning model without conversion. --Exclude outlier records to improve accuracy, etc.
--Machine learning models are passed as numerical data instead of string data --Example. For numeric data (0,1,2 ...) instead of string data (day of the week: Mon, Tue, Wed ...)
--Exclude outlier records to improve accuracy --Example: Check if there is a silent track with a long number of seconds on the secret track --Example. Check if there is a song with double tempo (BPM)
The data that can be acquired by the provided API has no missing values and is in a form that can be easily imported by the machine. Therefore, it is not very suitable as a pretreatment study material.
Also, tempo (BPM), key (key), and time_signitune (beat) are not always decided by one song. In the first place, it is necessary to consider whether it should be an item to be analyzed. This is a point to check during EDA.
I would like to summarize a specific example of preprocessing in a separate article. In this article, instead of pre-processing, we will put a process to add a human-readable value to another column during EDA.
python
#In a separate column with the key as the label value: D major(major)Is 1,Monotonous(minor)is 0
tracks_with_features_df.loc[tracks_with_features_df['mode'] == 1, 'a_mode'] = 'major'
tracks_with_features_df.loc[tracks_with_features_df['mode'] == 0, 'a_mode'] = 'minor'
#Key as label value in another column: C is 1, C#Is 2...
tracks_with_features_df.loc[tracks_with_features_df['key'] == 0, 'a_key'] = 'C'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 1, 'a_key'] = 'C#'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 2, 'a_key'] = 'D'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 3, 'a_key'] = 'D#'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 4, 'a_key'] = 'E'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 5, 'a_key'] = 'F'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 6, 'a_key'] = 'F#'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 7, 'a_key'] = 'G'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 8, 'a_key'] = 'G#'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 9, 'a_key'] = 'A'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 10, 'a_key'] = 'A#'
tracks_with_features_df.loc[tracks_with_features_df['key'] == 11, 'a_key'] = 'B'
#Conversion in hours: milliseconds → seconds
tracks_with_features_df['a_second'] = tracks_with_features_df['duration_ms'] / 1000
We used the Spotify API to show the data visualization and preprocessing of the audio data of a song. Next time, we will visualize the similarity of songs from audio data. Well then.
Recommended Posts