Somehow I tried using jupyter notebook


Somehow I tried using jupyter notebook


Famous as a tutorial for kaggle Use data from Titanic passengers

The environment construction was based on the following http://qiita.com/mix_dvd/items/29dfb8d47a596b4df36d

Put the required libraries

import pandas as pd
from pandas import DataFrame,Series
import numpy as np

Read csv and plunge into dataframe

titanic_df = pd.read_csv('train.csv')

Display the first 5 lines

titanic_df.head()
ssengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

Put the library required for drawing

import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline

Count by gender

sns.countplot('Sex',data = titanic_df)

output_10_2.png

If it is less than 16, it is a function that returns the gender otherwise.

def male_female_child(passenger):
    age, sex  = passenger
    if age < 16:
        return 'child'
    else:
        return sex  

Add the result of the function to the person column

titanic_df['person'] = titanic_df[['Age','Sex']].apply(male_female_child,axis = 1)

Make sure the person column has been added

titanic_df.head(10)
ssengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked person
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S male
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C female
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S female
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S female
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S male
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q male
6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S male
7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S child
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S female
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C child

Draw Pclass (room class, 1st class, 2nd class, 3rd class) layered

sns.countplot('Pclass',data = titanic_df ,hue = 'person')

output_17_2.png

For the time being, I tried using jupyter, but it's convenient It's nice to be able to leave the code and the result together

Let's do the survival analysis of kaggle's Titanic passengers next time.

Recommended Posts

Somehow I tried using jupyter notebook
I tried using Jupyter
[Pythonocc] I tried using CAD on jupyter notebook
I tried VS Code's Jupyter notebook
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using PyCaret
I tried using Heapq
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
Using Graphviz with Jupyter Notebook
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
Try using Jupyter Notebook dynamically
[Python] I tried using OpenPose
I tried using magenta / TensorFlow
I tried to touch jupyter
I tried using AWS Chalice
I tried using Slack emojinator
I tried using PySpark from Jupyter 4.x on EMR
I tried using Rotrics Dex Arm # 2
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried server-client communication using tmux
I tried deep learning using Theano
[Kaggle] I tried undersampling using imbalanced-learn
I tried shooting Kamehameha using OpenPose
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
I tried using Amazon SQS with django-celery
I want to blog with Jupyter Notebook
I tried playing a ○ ✕ game using TensorFlow
I tried using YOUTUBE Data API V3
I tried using Selenium with Headless chrome
I tried drawing a line using turtle
I tried using Bayesian Optimization in Python
I tried to classify text using TensorFlow
I tried using Selective search as R-CNN
I tried using UnityCloudBuild API from Python
I tried simple image recognition with Jupyter