The update has been delayed. There are many reasons (or excuses), but I think most of the time I didn't write much code. At the beginning of the year, the amount of lessons and assignments settled down a little, and at the same time, I was wandering around in my mind, thinking about my graduation work and my career path after graduation, so I didn't move much. So I updated my my blog every day, but my honest impression is that there wasn't enough content to write on Qiita. I'm still a little lost, but for the time being, my graduation work has been decided and I'm working hard to make it.
During the class, I played with some data on the cloud. I also learned about an amazing site called Kaggle. This is the first time I tried it while taking data from Kaggle myself after class and checking it.
https://www.kaggle.com/unsdsn/world-happiness#2019.csv I thought that I could see various correlations, so I chose this.
from google.colab import drive
drive.mount('/content/drive')
Upload the required CSV to google drive in advance
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("/content/drive/My Drive/2019.csv")
I was impressed that there are so many libraries besides this!
df.count()
Overall rank | 156 |
---|---|
Country or region | 156 |
Score | 156 |
GDP per capita | 156 |
Social support | 156 |
Healthy life expectancy | 156 |
Freedom to make life choices | 156 |
Generosity | 156 |
Perceptions of corruption | 156 |
Healthy life expectancy | 156 |
Number of data 156, no missing values If I don't do this, I don't know whether to display all the data or only the first few, so I tried it. I also wanted to avoid data with many missing values because it seemed to be confusing. (I think we need to take on the challenge in the future, but for the time being, this is the first time)
df.head(20)
Overall rank | Country or region | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
---|---|---|---|---|---|---|---|---|
1 | Finland | 7.769 | 1.340 | 1.587 | 0.986 | 0.596 | 0.153 | 0.393 |
2 | Denmark | 7.600 | 1.383 | 1.573 | 0.996 | 0.592 | 0.252 | 0.410 |
3 | Norway | 7.554 | 1.488 | 1.582 | 1.028 | 0.603 | 0.271 | 0.341 |
4 | Iceland | 7.494 | 1.380 | 1.624 | 1.026 | 0.591 | 0.354 | 0.118 |
5 | Netherlands | 7.488 | 1.396 | 1.522 | 0.999 | 0.557 | 0.322 | 0.298 |
6 | Switzerland | 7.480 | 1.452 | 1.526 | 1.052 | 0.572 | 0.263 | 0.343 |
7 | Sweden | 7.343 | 1.387 | 1.487 | 1.009 | 0.574 | 0.267 | 0.373 |
8 | New Zealand | 7.307 | 1.303 | 1.557 | 1.026 | 0.585 | 0.330 | 0.380 |
9 | Canada | 7.278 | 1.365 | 1.505 | 1.039 | 0.584 | 0.285 | 0.308 |
10 | Austria | 7.246 | 1.376 | 1.475 | 1.016 | 0.532 | 0.244 | 0.226 |
11 | Australia | 7.228 | 1.372 | 1.548 | 1.036 | 0.557 | 0.332 | 0.290 |
12 | Costa Rica | 7.167 | 1.034 | 1.441 | 0.963 | 0.558 | 0.144 | |
13 | Israel | 7.139 | 1.276 | 1.455 | 1.029 | 0.371 | 0.261 | 0.082 |
14 | Luxembourg | 7.090 | 1.609 | 1.479 | 1.012 | 0.526 | 0.194 | 0.316 |
15 | United Kingdom | 7.054 | 1.333 | 1.538 | 0.996 | 0.450 | 0.348 | 0.278 |
16 | Ireland | 7.021 | 1.499 | 1.553 | 0.999 | 0.516 | 0.298 | 0.310 |
17 | Germany | 6.985 | 1.373 | 1.454 | 0.987 | 0.473 | 0.160 | 0.210 |
19 | United | States | 6.892 | 1.433 | 1.457 | 0.874 | 0.454 | 0.280 |
20 | Czech Republic | 6.852 | 1.269 | 1.487 | 0.920 | 0.457 | 0.046 | 0.036 |
Japan is not included
df.describe()
Overall rank | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
---|---|---|---|---|---|---|---|
count | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 | 156.000000 |
mean | 78.500000 | 5.407096 | 0.905147 | 1.208814 | 0.725244 | 0.392571 | 0.184846 |
std | 45.177428 | 1.113120 | 0.398389 | 0.299191 | 0.242124 | 0.143289 | 0.095254 |
min | 1.000000 | 2.853000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 39.750000 | 4.544500 | 0.602750 | 1.055750 | 0.547750 | 0.308000 | 0.108750 |
50% | 78.500000 | 5.379500 | 0.960000 | 1.271500 | 0.789000 | 0.417000 | 0.177500 |
75% | 117.250000 | 6.184500 | 1.232500 | 1.452500 | 0.881750 | 0.507250 | 0.248250 |
max | 156.000000 | 7.769000 | 1.684000 | 1.624000 | 1.141000 | 0.631000 | 0.566000 |
Is this just the nature of the data? I can understand how.
#Library preparation
import numpy as np
import pandas as pd
#Data set preparation
##Happiness to array
happy = df["Score"]
##Arrange social welfare
social = df["Social support"]
#Get the correlation coefficient!
correlation = np.corrcoef(social, happy)
print(correlation)
[[1. 0.77705779] [0.77705779 1. ]]
It came out ~~ Since the correlation coefficient is 0.7, social welfare has a strong correlation with happiness! !!
#Library preparation
import pandas as pd
import numpy as np
#You should be able to get the correlation coefficient between columns!
corr_df =df.corr()
print(corr_df)
Overall rank ... Perceptions of corruption Overall rank 1.000000 ... -0.351959 Score -0.989096 ... 0.385613 GDP per capita -0.801947 ... 0.298920 Social support -0.767465 ... 0.181899 Healthy life expectancy -0.787411 ... 0.295283 Freedom to make life choices -0.546606 ... 0.438843 Generosity -0.047993 ... 0.326538 Perceptions of corruption -0.351959 ... 1.000000
[8 rows x 8 columns]
I'm sorry it's hard to see, I'm out for the time being!
#Library preparation
import seaborn as sns
sns.heatmap(corr_df, cmap= sns.color_palette('cool', 5), annot=True,fmt='.2f', vmin = -1, vmax = 1)
I forgot to completely overtake the overall rank, but I managed to get it!
I've always been interested in python. Although the number of lessons was limited to 4 times in total, it was interesting to learn various things about "data", not just python. The rest was purely fun. I remember doing a little statistics at SPSS when I was a graduate student. I didn't use statistics in my master's thesis, so I only touched on it a little, but it was purely interesting at that time as well. I remembered that time when I was young lol
For the time being, this was my first time, so I tried it roughly without thinking about deep things. Instead of thinking about strict statistics, simply try to figure out or visualize. If you look at the specialists, I think there are a lot of things to do. There are many things I would like to do with python, such as factor analysis, principal component analysis, and logistic analysis that I used to do with SPSS. I couldn't do it at all due to lack of time and knowledge during the class ...
The class itself is over, but at the same time I felt the potential of machine learning, and at the same time, I realized that I had a faint feeling because I was in this industry. I'm sure it's not just fun or interesting things while studying, but in the future I would like to study as much as I can. I'm not planning to include machine learning in the product I'm making, so how much can I do in a different frame from the complete graduation work? It seems that it will be finite non-execution, and I'm already worried, but I wrote Qiita to warn myself, so I'd like to try it little by little. I hope to be able to describe such contents little by little in the future.
Recommended Posts