Data analysis Beginners studied pandas by examining the relationship between ice cream spending per household and temperature. (I can see the result somehow ...) The reference was "[Introduction to data analysis by Python](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8B%E3%83%" 87% E3% 83% BC% E3% 82% BF% E8% A7% A3% E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E5% B1% B1% E5% 86% 85 -% E9% 95% B7% E6% 89% BF / dp / 4274222888 / ref = sr_1_3? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords = Python + % E3% 83% 87% E3% 83% BC% E3% 82% BF% E8% A7% A3% E6% 9E% 90 & qid = 1583399806 & sr = 8-3) "and the following two sites.
Ice cream BIZ statistics JMA Monthly Average Temperature
I used the 2018 data posted on the above site.
import pandas as pd
import math
import matplotlib.pyplot as plt
import statsmodels.api as sm
ice_url = 'http://www.icecream.or.jp/biz/data/expenditures.html'
temp_url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3.php?%20prec_no=44&block_no=47662'
ice = pd.read_html(ice_url)[0]
temp = pd.read_html(temp_url)[0]
# [0]Specifies the first table in
ice_2018 = ice.iloc[1:13, 5].astype(float)
temp_2018 = temp.iloc[144, 1:13].astype(float)
#Extract only 2018 data and convert to numeric type
month = pd.DataFrame([i for i in range(13)])
#Have n months ready
Now we have the 2018 ice cream spending per household and the average monthly temperature.
icecream = pd.concat([month, ice_2018, temp_2018], axis=1)[1:]
#Combine n months with ice spending and average temperature
x_data, y_data = icecream[144], icecream[5]
avetem, aveex = x_data.sum() / 12, y_data.sum() / 12
#Annual average temperature(avetem)And ice cream spending(aveex)Was asked
The correlation coefficient is expressed as follows using the covariance $ S_ {xy} $ and so on.
r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}
According to this formula, the correlation coefficient $ r $ was honestly calculated as follows. (I think there is a better way)
for i in range(len(icecream)):
ex = icecream.iloc[i,1] - aveex
tem = icecream.iloc[i,2] - avetem
extem += ex*tem
ex0 += ex**2
tem0 += tem**2
extem0 = math.sqrt(ex0)*math.sqrt(tem0)
r = extem / extem0
# r = 0.8955143151163499
The results show that there is a strong correlation between ice cream spending and temperature.
Finally, let's perform regression analysis using pandas ols.
X = sm.add_constant(x_data) #It seems necessary to find the intercept
model = sm.OLS(y_data, X)
results = model.fit()
a, b = results.params[0], results.params[1]
# a:Intercept, b:Tilt
plt.plot(x_data, a+b*x_data)
plt.scatter(icecream[144], icecream[5])
As a result, a regression line as shown in the figure was obtained.
I see, it seems that you want to eat ice cream when it gets hot.
Recommended Posts