The problem with 12-10 is It is a problem to test the correlation coefficient between the LDP vote rate and the home ownership ratio. The material is quite old (1983 general election !!), but it is interesting that the more you own a house, the more the LDP seems to have an advantage.
So, instead of just solving it, I decided to display the graph using pandas and matplotlib.
I downloaded the necessary libraries from http://www.lfd.uci.edu/~gohlke/pythonlibs/. Sometimes it didn't work when I put it in with pip install.
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
p.65 From Table 3.13, steadily insert data into csv. Load the completed csv file (named table_3_13.csv) into the dataframe as follows.
df = pd.read_csv('table_3_13.csv', encoding='shift-jis')
#If the result looks garbled, check the encode.
df
I was able to read it like this.
A simple graph is displayed as follows.
d = df[0:47] #I narrowed down the results from Hokkaido to Okinawa only.
#Looking at the graph, it seems that there is some correlation.
plt.xlabel(d.columns[1])
plt.ylabel(d.columns[2])
plt.scatter(d[[1]], d[[2]])
plt.show()
There seems to be a correlation between the home ownership ratio and the LDP vote rate.
#Add letters to each element
fig, ax = plt.subplots(figsize=(15,15)) #If the graph is not large to some extent, the prefecture name cannot be seen.
df.plot(1, 2, kind='scatter', ax=ax)
for k, v in df.iterrows():
ax.annotate(v[0], xy=(v[1], v[2]), size=12) #v[0]Prefecture name, v[1]Is the LDP vote rate, v[2]The home ownership ratio is included in.
plt.show()
At a glance, you can see that the ratio of homeowners seems to be higher in rural areas.
Pandas can be easily calculated using the corr method. It was like this.
d.corr()
Liberal Democratic Vote Rate | Owned house ratio | |
---|---|---|
Liberal Democratic Vote Rate | 1.000000 | 0.638782 |
Owned house ratio | 0.638782 | 1.000000 |
A test is performed to see how probable the obtained correlation coefficient is. Here, we use Fisher's z-transform as a test of the correlation coefficient. Fisher's z-transform looks like this:
When there is a two-dimensional normal population and the population correlation coefficient is $ \ rho $ and the sample correlation coefficient is $ r
Set $ \ rho = 0.0 $ and calculate with python as below.
n=48 #The number of data
r = 0.638782
rho = 0.0
z= 0.5*np.log((1+r)/(1-r))
eta = 0.5* np.log((1+rho)/(1-rho))
Z = np.sqrt(n-3)*(z-eta)
print("Z=",Z) #Z= 5.07216324479
On the other hand, since $ Z_ {0.025} = 1.96 $, it is clear that $ Z_ {0.025} <Z $, so the hypothesis is rejected. Therefore, it cannot be said that there is no correlation (significance level 0.05).
If you write it in python in the same way as i) with $ \ rho = 0.5 $, it looks like this.
n=48 #The number of data
r = 0.638782
rho = 0.5
z= 0.5*np.log((1+r)/(1-r))
eta = 0.5* np.log((1+rho)/(1-rho))
Z = np.sqrt(n-3)*(z-eta)
print("Z=",Z)
For the obtained $ z = 1.39 $, the null hypothesis is not rejected from $ Z_ {0.025} = 1.96> 1.39 $. Therefore, the population correlation coefficient may be 0.5. (Significance level 0.05)
Actually, I wanted to paint the map of Japan using geopandas, but I failed to install it on win10. Once you know how to do it, try again.
The value of $ Z $ when the area of the distribution function becomes $ a $ can be obtained by the following function.
stats.norm.ppf(a)
This time, the superiority level of both shoulders is 0.05, so calculate as follows.
stats.norm.ppf(1-0.025) #1.959963984540054
$ Z_ {0.025} = 1.96 $ is well known, but it's about the same as the above result.
Recommended Posts