Introduction

For studying machine learning and Bayesian optimization, I tried to score k, kaggle's tutorial-like competition "Titanic: Machine Learning from Disaster" with a neural network. For high-para optimization, we use Preferred Networks' optuna library (official site) (https://preferred.jp/ja/projects/optuna/'optuna library').

result

Public Score : 0.7655

Note

I will put the link of the kaggle note. kaggle notebook

code

Preprocessing

The first is pre-processing for deletion of features that are unlikely to be related to defective land processing. I did it by intuition.

train = train.fillna({'Age':train['Age'].mean()})
X_df = train.drop(columns=['PassengerId','Survived', 'Name', 'Ticket', 'Cabin', 'Embarked'])
y_df = train['Survived']

Next is the acquisition of dummy variables.

X_df = X_df.replace('male', 0)
X_df = X_df.replace('female', 1)

Divide the data into training and evaluation.

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X_df.values, y_df.values, test_size=0.25, shuffle=True, random_state=0)

Let's take a look at the contents of X_train. The column names are Pclass, Sex, Age, SibSp, Parch, Fare.

[[ 3.          0.         28.          0.          0.          7.8958    ]
 [ 3.          1.         17.          4.          2.          7.925     ]
 [ 3.          0.         30.          1.          0.         16.1       ]
 ...
 [ 3.          0.         29.69911765  0.          0.          7.7333    ]
 [ 3.          1.         36.          1.          0.         17.4       ]
 [ 2.          0.         60.          1.          1.         39.        ]]

neural network

We will build a neural network model. Only fully connected layer. Optuna also optimizes the number of hidden layers and the number of units.

def create_model(activation, num_hidden_layer, num_hidden_unit):
    inputs = Input(shape=(X_train.shape[1],))
    model = inputs
    for i in range(1,num_hidden_layer):
        model = Dense(num_hidden_unit, activation=activation,)(model)
        
        
    model = Dense(1, activation='sigmoid')(model)
    model = Model(inputs, model)

    return model

Determine the range of parameters to optimize with optuna. It minimizes or maximizes the return value of the function. The default is minimized. If you want to maximize it, you can do it with create_study ('direction = maximize) which will appear later.

def objective(trial):
    K.clear_session()
    
    activation = trial.suggest_categorical('activation',['relu','tanh','linear'])
    optimizer = trial.suggest_categorical('optimizer',['adam','rmsprop','adagrad', 'sgd'])

    
    num_hidden_layer = trial.suggest_int('num_hidden_layer',1,5,1)
    num_hidden_unit = trial.suggest_int('num_hidden_unit',10,100,10)
    

    
    learning_rate = trial.suggest_loguniform('learning_rate', 0.00001,0.1)
    if optimizer == 'adam':
      optimizer = Adam(learning_rate=learning_rate)
    elif optimizer == 'adagrad':
      optimizer = Adagrad(learning_rate=learning_rate)
    elif optimizer =='rmsprop':
      optimizer = RMSprop(learning_rate=learning_rate)
    elif optimizer =='sgd':
      optimizer = SGD(learning_rate=learning_rate)
    
    
    model = create_model(activation, num_hidden_layer, num_hidden_unit)
    model_list.append(model)
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['acc', 'mape'],)
    
    es = EarlyStopping(monitor='val_acc', patience=50)
    history = model.fit(X_train, y_train, validation_data=(X_val, y_val), verbose=0, epochs=200, batch_size=20, callbacks=[es])
    
   
    history_list.append(history)
    
    val_acc = np.array(history.history['val_acc'])
    
    return 1-val_acc[-1]

Learn and optimize. After the optimization, I put each model in a list for easy re-learning. It took about 6 minutes and 12 seconds.

model_list=[]
history_list=[]
study_name = 'titanic_study'
study = optuna.create_study(study_name=study_name,storage='sqlite:///../titanic_study.db', load_if_exists=True)
study.optimize(objective, n_trials=50, )

See the result of the optimization.

print(study.best_params)
print('')
print(study.best_value)

The result of optimization. I'm sorry for the miscellaneous. The top is each high para, and the bottom is the correct answer rate.

{'activation': 'relu', 'learning_rate': 0.004568302718922509, 'num_hidden_layer': 5, 'num_hidden_unit': 50, 'optimizer': 'rmsprop'}

0.17937219142913818

Predict using test data. Before that, do a sufficient amount of learning with the best parameters. The preprocessing of the test data is almost the same as the training data, but the PassengerId is saved in a separate data frame for score submission.

model_list[study.best_trial._number-1].compile(optimizer=study.best_trial.params['optimizer'], loss='binary_crossentropy', metrics=['acc', 'mape'],)  
es = EarlyStopping(monitor='val_acc', patience=100)
history = model_list[study.best_trial._number-1].fit(X_train, y_train, validation_data=(X_val, y_val), verbose=1, epochs=400, batch_size=20, callbacks=[es])
predicted = model_list[study.best_trial._number-1].predict(X_test.values)
predicted_survived = np.round(predicted).astype(int)

Prediction result

The passenger and the survival prediction result are linked and output to csv to complete.

df = pd.concat([test_df_index,pd.DataFrame(predicted_survived, columns=['Survived'])], axis=1)
df.to_csv('gender_submission.csv', index=False)
df

	PassengerId	Survived
0	892	0
1	893	0
2	894	0
3	895	0
4	896	0
...	...	...
413	1305	0
414	1306	1
415	1307	0
416	1308	0
417	1309	0

418 rows × 2 columns

Public Score : 0.7655

Impressions

It was a subtle result. But it was very easy. I'm worried that I'll be relying on optuna for the rest of my life, and I'm worried that my tuning skills won't improve. Is it okay to automatically optimize everything?

Reference site

It was very easy to understand and helpful. [Introduction to Optuna](https://qiita.com/studio_haneya/items/2dc3ba9d7cafa36ddffa'Introduction to Optuna')

optuna, keras and titanic