Predict Kaggle's Titanic with keras (kaggle ⑦)

Introduction

This is a story about participating in a Kaggle competition. Until previous, I was learning Titanic with the scikit-learn model. This time, I would like to study with keras.

table of contents

  1. About the keras model
  2. Perceptron
  3. Grid search with Keras
  4. Learn
  5. Learn with Keras
  6. Result
  7. Summary reference History

1. Keras model

According to the Detailed Deep Learning book, Keras models are roughly divided into the following three types.

·perceptron ・ Convolutional neural network ・ Recurrent neural network

Perceptron is one of the basic neural networks. Convolutional neural networks (CNNs) are mainly used for image processing. Recurrent neural networks (RNNs) are used for time series data. This time, it is neither image processing nor time series data, so let's learn with Perceptron.

2. Perceptron

Perceptron is one of the basic neural networks, and it is a model of the following image that you often see.

Source: [Image materials such as neural networks and Deep Learning [WTFPL] for presentations and seminars](http://nkdkccmbr.hateblo.jp/entry/2016/10/06/222245)

Building a perceptron in keras is relatively easy. It can be built by adding layers to the Sequential model as shown below.

##############################
#Model building 5-layer perceptron
##############################
def create_model_5dim_layer_perceptron(activation="relu", optimizer="adam", out_dim=100, dropout=0.5):
    
    model = Sequential()

    #Input layer-Hidden layer 1
    model.add(Dense(input_dim=len(x_train.columns), units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 1-Hidden layer 2
    model.add(Dense(units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 2-Hidden layer 3
    model.add(Dense(units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 3-Output layer
    model.add(Dense(units=1))
    model.add(Activation("sigmoid"))

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    
    return model

Define the layer with Dense. The first Dense defines the dimensions of the input with input_dim. input_dim must match the number of dimensions of the input data. Specify the number of dimensions of the corresponding layer with units. Since the output layer this time has one dimension (survival 1 or 0), it is necessary to set units = 1 at the end.

Define the activation function with Activation. Any activation function other than the output layer can be used, but since the output layer is a binary classification, specify "sigmoid".

In addition, Dropout suppresses overfitting, and Batch Normalization normalizes each mini-batch.

4. Grid search with Keras

In the above 5-layer perceptron function, activation, optimizer, etc. are received as arguments. As you can see, it is necessary to set the parameters even for the perceptron of keras. In this article, we did a grid search with scikit-learn and investigated the parameters, but you can use the grid search with keras as well. Use KerasClassifier.

from keras.wrappers.scikit_learn import KerasClassifier
model = KerasClassifier(build_fn=create_model_5dim_layer_perceptron(input_dim), verbose=0)

By using KerasClassifier as described above, you can treat Keras model in the same way as scikit-learn model. After that, you can use GridSearchCV as well as scikit-learn. The code looks like this:

# Define options for parameters
activation = ["tanh", "relu"]
optimizer = ["adam", "adagrad"]
out_dim = [234, 468, 702]
nb_epoch = [25, 50]
batch_size = [8, 16]
dropout = [0.2, 0.4, 0.5]

param_grid = dict(activation=activation, 
                  optimizer=optimizer, 
                  out_dim=out_dim, 
                  nb_epoch=nb_epoch, 
                  batch_size=batch_size,
                  dropout=dropout)
grid = GridSearchCV(estimator=model, param_grid=param_grid)

# Run grid search
grid_result = grid.fit(x_train, y_train)

print(grid_result.best_score_)
print(grid_result.best_params_)

If you print best_score_ and best_params_, the following will be output.

0.7814285721097673
{'activation': 'relu', 'batch_size': 16, 'dropout': 0.5, 'nb_epoch': 25, 'optimizer': 'adam', 'out_dim': 702}
acc:0.72

5. Learn with Keras

Once the parameters are decided, learn with Keras and predict the result. The full code is below. The data preparation is the same as Last time.

import numpy
import pandas
import datetime

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers.core import Dropout
from keras.layers.normalization import BatchNormalization

############################################################
#Encode SibSp with one hot
# One hot encoding SibSp
############################################################
def get_dummies_sibSp(df_all, df, df_test) :
    
    categories = set(df_all['SibSp'].unique())
    df['SibSp'] = pandas.Categorical(df['SibSp'], categories=categories)
    df_test['SibSp'] = pandas.Categorical(df_test['SibSp'], categories=categories)
    
    df = pandas.get_dummies(df, columns=['SibSp'])
    df_test = pandas.get_dummies(df_test, columns=['SibSp'])

    return df, df_test

############################################################
#Encode Parch in one hot
# One hot encoding SibSp
############################################################
def get_dummies_parch(df_all, df, df_test) :
    
    categories = set(df_all['Parch'].unique())
    df['Parch'] = pandas.Categorical(df['Parch'], categories=categories)
    df_test['Parch'] = pandas.Categorical(df_test['Parch'], categories=categories)
    
    df = pandas.get_dummies(df, columns=['Parch'])
    df_test = pandas.get_dummies(df_test, columns=['Parch'])
    
    return df, df_test

############################################################
#Encode Ticket as one hot
# One hot encoding SibSp
############################################################
def get_dummies_ticket(df_all, df, df_test) :

    ticket_values = df_all['Ticket'].value_counts()
    ticket_values = ticket_values[ticket_values > 1]
    ticket_values = pandas.Series(ticket_values.index, name='Ticket')
    categories = set(ticket_values.tolist())
    df['Ticket'] = pandas.Categorical(df['Ticket'], categories=categories)
    df_test['Ticket'] = pandas.Categorical(df_test['Ticket'], categories=categories)
    
    df = pandas.get_dummies(df, columns=['Ticket'])
    df_test = pandas.get_dummies(df_test, columns=['Ticket'])

    return df, df_test

############################################################
#Standardization
# Standardization
############################################################
def standardization(df, df_test) :

    standard = StandardScaler()
    df_std = pandas.DataFrame(standard.fit_transform(df[['Pclass', 'Fare']].values), columns=['Pclass', 'Fare'])
    df.loc[:,'Pclass'] = df_std['Pclass']
    df.loc[:,'Fare'] = df_std['Fare']

    df_test_std = pandas.DataFrame(standard.transform(df_test[['Pclass', 'Fare']].values), columns=['Pclass', 'Fare'])
    df_test.loc[:,'Pclass'] = df_test_std['Pclass']
    df_test.loc[:,'Fare'] = df_test_std['Fare']

    return df, df_test

############################################################
#Data preparation
# prepare Data
############################################################
def prepareData() :

    ##############################
    #Data preprocessing
    #Extract the required items
    # Data preprocessing
    # Extract necessary items
    ##############################
    # gender_submission.load csv
    # Load gender_submission.csv
    df = pandas.read_csv('/kaggle/input/titanic/train.csv')
    df_test = pandas.read_csv('/kaggle/input/titanic/test.csv')
    
    df_all = pandas.concat([df, df_test], sort=False)
    
    df_test_index = df_test[['PassengerId']]

    df = df[['Survived', 'Pclass', 'Sex', 'SibSp', 'Parch', 'Ticket', 'Fare']]
    df_test = df_test[['Pclass', 'Sex', 'SibSp', 'Parch', 'Ticket', 'Fare']]
    
    ##############################
    #Data preprocessing
    #Handle missing values
    # Data preprocessing
    # Fill or remove missing values
    ##############################
    ##############################
    df = df[df['Fare'] != 5].reset_index(drop=True)
    df = df[df['Fare'] != 0].reset_index(drop=True)

    ##############################
    #Data preprocessing
    #Quantify the label (name)
    # Data preprocessing
    # Digitize labels
    ##############################
    ##############################
    #sex
    ##############################
    encoder_sex = LabelEncoder()
    df['Sex'] = encoder_sex.fit_transform(df['Sex'].values)
    df_test['Sex'] = encoder_sex.transform(df_test['Sex'].values)
    
    ##############################
    #Data preprocessing
    # One-Hot encoding
    # Data preprocessing
    # One-Hot Encoding
    ##############################
    ##############################
    # SibSp
    ##############################
    df, df_test = get_dummies_sibSp(df_all, df, df_test)
    
    ##############################
    # Parch
    ##############################
    df, df_test = get_dummies_parch(df_all, df, df_test)
    
    ##############################
    # Ticket
    ##############################
    df, df_test = get_dummies_ticket(df_all, df, df_test)
 
    ##############################
    #Data preprocessing
    #Standardize numbers
    # Data preprocessing
    # Standardize numbers
    ##############################
    df, df_test = standardization(df, df_test)

    ##############################
    #Data preprocessing
    #Handle missing values
    # Data preprocessing
    # Fill or remove missing values
    ##############################
    df.fillna({'Fare':0}, inplace=True)
    df_test.fillna({'Fare':0}, inplace=True)
        
    ##############################
    #Separate training data and test data
    # Split training data and test data
    ##############################
    x = df.drop(columns='Survived')
    y = df[['Survived']]

    return x, y, df_test, df_test_index

##############################
#Model building 5-layer perceptron
##############################
def create_model_5dim_layer_perceptron(input_dim, \
                                       activation="relu", \
                                       optimizer="adam", \
                                       out_dim=100, \
                                       dropout=0.5):
    
    model = Sequential()

    #Input layer-Hidden layer 1
    model.add(Dense(input_dim=input_dim, units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 1-Hidden layer 2
    model.add(Dense(units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 2-Hidden layer 3
    model.add(Dense(units=out_dim))
    model.add(BatchNormalization())
    model.add(Activation(activation))
    model.add(Dropout(dropout))

    #Hidden layer 3-Output layer
    model.add(Dense(units=1))
    model.add(Activation("sigmoid"))

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    
    return model
    

if __name__ == "__main__":
    
    #Data preparation
    x_train, y_train, x_test, y_test_index = prepareData()
    
    #Build a model
    model = create_model_5dim_layer_perceptron(len(x_train.columns), \
                                               activation="relu", \
                                               optimizer="adam", \
                                               out_dim=702, \
                                               dropout=0.5)

    #Learning
    fit = model.fit(x_train, y_train, epochs=25, batch_size=16, verbose=2)
    
    #Forecast
    y_test_proba = model.predict(x_test)
    y_test = numpy.round(y_test_proba).astype(int)
    
    #Combine the result with the DataFrame of the PassengerId
    # Combine the data frame of PassengerId and the result
    df_output = pandas.concat([y_test_index, pandas.DataFrame(y_test, columns=['Survived'])], axis=1)

    # result.Write csv to current directory
    # Write result.csv to the current directory
    df_output.to_csv('result.csv', index=False)

6. Result

The result was "0.79425".

7. Summary

Compared to scikit-learn, Keras gives the impression that you can make detailed settings such as the number of layers and the number of neurons. However, I think there are few types of models other than convolutional neural networks (CNN) and recurrent neural networks (RNN). It seems good to use scikit-learn and Keras properly depending on the situation.

Next time I would like to learn using R language.

reference

Parameter optimization automation with Keras with GridSearch CV https://qiita.com/cvusk/items/285e2b02b0950537b65e

History

2020/02/15 First edition released 2020/02/22 Next link added

Recommended Posts

Predict Kaggle's Titanic with keras (kaggle ⑦)
Select models with Kaggle's Titanic (kaggle ④)
Check raw data with Kaggle's Titanic (kaggle ⑥)
I tried learning with Kaggle's Titanic (kaggle②)
Check the correlation with Kaggle's Titanic (kaggle③)
I tried to predict and submit Titanic survivors with Kaggle
Challenge Kaggle Titanic
Try all scikit-learn models on Kaggle's Titanic (kaggle ⑤)
I tried to predict Titanic survival with PyCaret
Image recognition with keras
Deep learning image analysis starting with Kaggle and Keras
optuna, keras and titanic
CIFAR-10 tutorial with Keras
Multivariate LSTM with Keras
Try Kaggle's Titanic tutorial
Predicting Kaggle's Hello World, Titanic Survivors with Logistic Regression-Modeling-
Predicting Kaggle's Hello World, Titanic Survivors with Logistic Regression-Prediction / Evaluation-
Multiple regression analysis with Keras
Predict candlesticks with artificial intelligence
Auto Encodder notes with Keras
Implemented word2vec with Theano + Keras
[Kaggle] Try Predict Future Engineering
Sentence generation with GRU (keras)
Tuning Keras parameters with Keras Tuner
Kaggle Tutorial Titanic Accuracy 80.9% (Top 7% 0.80861)
Easily build CNN with Keras
Implemented Efficient GAN with keras
[For Kaggle beginners] Titanic (LightGBM)
Try machine learning with Kaggle
Image recognition with Keras + OpenCV
Challenge image classification by TensorFlow2 + Keras 4 ~ Let's predict with trained model ~