Today, without thinking too difficult For the time being, using "Titanic data", We will do a statistical analysis.
Microsoft Windows Version:10.0 Python Version:3.8.1
↓ (Reference) Check the version of windows
C:\Users\User name>ver
↓ (Reference) Python version check
C:\Users\User name>python
Python seems to be convenient to run in a virtual environment, so I also use a virtual environment.
↓ (Reference) Launching a virtual environment
C:\Users\User name>Virtual environment name\scripts\activate
↓ (Reference) When the virtual environment is started, it will be displayed like this
(Virtual environment name)C:\Users\User name>
The package used this time is ・ Numpy ・ Pandas ・ Matplotlib ・ Seaborn is.
↓ Installation
(Virtual environment name)C:\Users\User name>pip install package name
↓ List of installed packages
(Virtual environment name)C:\Users\User name>pip list
↓ The result looks like this.
(Virtual environment name)C:\Users\User name>python
↓ (Reference) When Python starts, it will be displayed like this (Only >>> is displayed ...)
>>>
This time, we will use "Titanic Data" available from the global data competition "Kaggle".
I myself am a super beginner and I didn't understand well, so I once saved the data directly under the "C: \ Users \ username" folder. (I tried absolute paths and relative paths, but for some reason it didn't work ... (TT))
↓ Package import
import pandas as pd
↓ Use the "read_csv" class of "pandas (pd)" to "train.csv" Store in "df".
df = pd.read_csv("train.csv")
Once you've come this far, you can look at the data as you like.
After importing the required packages Let's display the data item "Survived (survivor = 1, death = 0)" stored in "df" on a bar graph.
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot("Survived",data=df,palette='rainbow')
plt.show()
↓ "plt.show ()" execution result It was convenient to be able to adjust the vertical and horizontal deviations and save images from here!
sns.heatmap(df.corr(),annot=True,cmap='RdYlGn',vmin=-1,vmax=1,fmt=".2f",square=True)
plt.show()
sns.pairplot(df)
plt.show()
Thank you for reading today. This post was the first post, and I think there were some things that were difficult to understand. Please pardon. If you have any suggestions, we will accept them (I don't know how to accept them ...) Thank you very much. If I find an error myself, I will correct it each time. I will answer as many questions as possible (I don't know how to accept this either ...) Please feel free to ask any questions. See you somewhere again ~ (^^) ♪
Recommended Posts