I tried Python! ] Titanic data _vol.1 (bar graph, scatter plot, correlation coefficient)

Today, without thinking too difficult For the time being, using "Titanic data", We will do a statistical analysis.

■ Introduction


My execution environment

Microsoft Windows Version:10.0 Python Version:3.8.1

↓ (Reference) Check the version of windows

C:\Users\User name>ver

↓ (Reference) Python version check

C:\Users\User name>python


About virtual environment

Python seems to be convenient to run in a virtual environment, so I also use a virtual environment.

↓ (Reference) Launching a virtual environment

C:\Users\User name>Virtual environment name\scripts\activate

↓ (Reference) When the virtual environment is started, it will be displayed like this

(Virtual environment name)C:\Users\User name>

■ Installation of required packages

The package used this time is ・ Numpy ・ Pandas ・ Matplotlib ・ Seaborn is.

↓ Installation

(Virtual environment name)C:\Users\User name>pip install package name

↓ List of installed packages

(Virtual environment name)C:\Users\User name>pip list

↓ The result looks like this. インストール済ライブラリ.PNG

■ Start python

(Virtual environment name)C:\Users\User name>python

↓ (Reference) When Python starts, it will be displayed like this (Only >>> is displayed ...)

>>>

■ Reading data

This time, we will use "Titanic Data" available from the global data competition "Kaggle".


Data storage folder

I myself am a super beginner and I didn't understand well, so I once saved the data directly under the "C: \ Users \ username" folder. (I tried absolute paths and relative paths, but for some reason it didn't work ... (TT))


Data reading

↓ Package import

import pandas as pd

↓ Use the "read_csv" class of "pandas (pd)" to "train.csv" Store in "df".

df = pd.read_csv("train.csv")

■ Finally try data analysis!

Once you've come this far, you can look at the data as you like.


Number of survivors and deaths

After importing the required packages Let's display the data item "Survived (survivor = 1, death = 0)" stored in "df" on a bar graph.

import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot("Survived",data=df,palette='rainbow')
plt.show()

↓ "plt.show ()" execution result It was convenient to be able to adjust the vertical and horizontal deviations and save images from here! Figure1イメージ.PNG


Correlation coefficient matrix (heat map)

sns.heatmap(df.corr(),annot=True,cmap='RdYlGn',vmin=-1,vmax=1,fmt=".2f",square=True)
plt.show()

ヒートマップイメージ.png


Pair plot diagram

sns.pairplot(df)
plt.show()

ペアプロットイメージ.png

■ Finally

Thank you for reading today. This post was the first post, and I think there were some things that were difficult to understand. Please pardon. If you have any suggestions, we will accept them (I don't know how to accept them ...) Thank you very much. If I find an error myself, I will correct it each time. I will answer as many questions as possible (I don't know how to accept this either ...) Please feel free to ask any questions. See you somewhere again ~ (^^) ♪

Recommended Posts

I tried Python! ] Titanic data _vol.1 (bar graph, scatter plot, correlation coefficient)
I tried factor analysis with Titanic data!
I tried principal component analysis with Titanic data!
I tried to get CloudWatch data with Python
I created a stacked bar graph with matplotlib in Python and added a data label
I tried to graph the packages installed in Python
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried Python> autopep8
I tried Python> decorator
I tried to make various "dummy data" with Python faker
[Python] I tried collecting data using the API of wikipedia
I tried fp-growth with python
I tried scraping with Python
I tried Python C extension
[Python] I tried using OpenPose
[Python] Plot time series data
I tried gRPC with Python
I tried scraping with python
[Pandas] I tried to analyze sales data with Python [For beginners]
[Python] Analyze Splatoon 2 league match data using a correlation coefficient table
[Python] I tried to get various information using YouTube Data API!