It is a continuation from the previous. Under the title of [Predicting the goal time of a full marathon by machine learning], from data collection to model creation and prediction in order to predict the goal time when running a full marathon (42.195 km) from data during running practice I will write a series of flow of.
In the previous article (Predicting the goal time of a full marathon by machine learning-②: I tried to create learning data with Garmin-), I created learning data. In order to do so, we have described the procedure for deleting unnecessary items and adding necessary data.
This time, before creating a prediction model that predicts the goal time of a full marathon using the created training data, we will describe how to visualize the data and see the overall trend. Some of them are easy to do in Excel, but I hope you have the opportunity to know how to write code if you want to do the same in Python. [pixtabay](https://pixabay.com/en/photos/%E7%8C%AB-%E3%83%A1%E3%82%AC%E3%83%8D-%E7%9C%BC%E9 From% 8F% A1-% E3% 83% 9A% E3% 83% 83% E3% 83% 88-984097 /)
We are creating learning data featuring 14 items that are thought to affect the distance and pace during running.
Sample data for one record
Practice Time | Distance | Time | Average heart rate | Max heart rate | Aerobic TE | Average pitch | Average pace | Max pace | Average stride | temperature | Wind speed | Work | Average sleep time |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020/2/23 16:18:00 | 8.19 | 0:59:35 | 161 | 180 | 3.6 | 176 | 00:07:16 | 00:06:11 | 0.78 | 7.9 | 9 | 44.5 | 6:12:00 |
First, import what you think you will need to visualize the data. For the time being, I think that this is enough.
RunnningDataVisualization.ipynb
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import seaborn as sns
You can draw a graph of the monthly mileage with the following code.
RunnningDataVisualization.ipynb
df = pd.read_csv(r'Activities.csv', index_col=["PracticeTime"],parse_dates=True)
#"PracticeTime"To read as a date type, specify the index as an argument index_Do it with col
#parse_Specify True for dates and index_Set the item specified by col as a date type index
#Draw graph
df_m = df.resample(rule="M").sum()
df_m_graph = df_m['Distance']
df_m_graph.plot.bar()
#Set various graph display formats
plt.title("Distance per month", fontsize = 22) #Give the graph a title name
plt.grid(True) #Add a scale line to the graph
plt.xlabel("month", fontsize = 15) #Label the horizontal axis of the graph
plt.ylabel("km", fontsize = 15) #Label the vertical axis of the graph
plt.yticks( np.arange(0, 60, 5) ) #Adjust the size of the graph
Execution result
If you look at it like this, you can see how much you haven't practiced in the hot summer months.
Next, I will draw a scatter plot to see if there is a correlation between the pace and pitch per kilometer. Generally speaking, if the pace slows down, the pitch (steps per minute) will decrease, but what about the reality?
RunnningDataVisualization.ipynb
df = df.sort_values("Average pace") #Sort the pace in order of speed
plt.scatter(df['Average pace'], df['Average pitch'],s=40 ,marker="*", linewidths="4", edgecolors="orange") #Draw a scatter plot
plt.title("Scatter plot of pace and pitch", fontsize = 22)
plt.ylabel('Average pitch', fontsize = 15)
plt.xlabel('Average pace', fontsize = 15)
plt.grid(True)
plt.xticks(rotation=90)
plt.figure(figsize=(50, 4))
Execution result
You can see that the pitch is different from time to time, regardless of whether the pace is fast or slow.
Then what about the relationship between pace and stride? If the pace slows down, the stride (step length per step) is likely to decrease.
RunnningDataVisualization.ipynb
df = df.sort_values("Average pace")
plt.scatter(df['Average pace'], df['Average stride'],s=40 ,marker="*", linewidths="4", edgecolors="blue")
plt.title("Scatter plot of pace and stride", fontsize = 22)
plt.ylabel('Average stride', fontsize = 15)
plt.xlabel('Average pace', fontsize = 15)
plt.grid(True)
plt.xticks(rotation=90)
plt.figure(figsize=(10, 10),dpi=200)
plt.show()
Execution result
Unlike the scatter plot of pace and pitch, you can see that the collection of points is somehow downward-sloping. In other words, it can be read that the slower the pace, the smaller the stride is up to 25 cm.
When you run a lot of distance, there will always be a moment when the pace slows down, but was this one of the causes? You can be convinced by visualizing with Python. ←
Finally, let's find out the correlation coefficient between each feature. Correlation with mileage, heart rate, etc. in the four features (temperature, wind speed, weekly working hours, average sleeping time) added to the training data in addition to the data recorded by Garmin If a strong feature quantity appears, it is considered that it has some influence on the pace and mileage.
This time, I didn't know how to calculate the correlation coefficient of the time data, so I calculated only the correlation coefficient between the features of the numerical data.
When calculating the correlation coefficient, type-convert the average heart rate and maximum heart rate values that were read as a character string when reading csv from the character string to a numerical value.
RunnningDataVisualization.ipynb
#Type conversion
df['Average heart rate'] = df['Average heart rate'].fillna(0).astype(np.int64)
df['Max heart rate'] = df['Max heart rate'].fillna(0).astype(np.int64)
#Visualize the correlation coefficient
df_corr = df.corr()
print(df_corr) #Display the correlation coefficient between features in a list
fig = plt.subplots(figsize=(8, 8)) #Easy-to-understand visualization
sns.heatmap(df_corr, annot=True,fmt='.2f',cmap='Blues',square=True)
Execution result
Among the three features (temperature, wind speed, and working hours on a weekly basis) that we paid attention to, none of them have an absolute value of correlation coefficient exceeding 0.5 with other features. In other words, it can be seen that these three features do not significantly affect the mileage or pace.
Well, if you think about it, you don't practice running on days that are too hot, too cold, or windy, and if you work a lot during the week, you will get a lot of physical fatigue. You will choose not to practice running. So this result is also convincing.
Unfortunately, I couldn't find the features that affect the mileage and pace just by calculating the correlation coefficient, but by visualizing while looking at various data like this, I am when I run. It's a good opportunity to look back on the trends and how to practice.
Next time, we will finally create a prediction model and rotate the prediction process.
Recommended Posts