Last time, when I predicted in a random forest using only gender as a model, all the men died and all the women survived, which was a great result. Day 67 [Introduction to Kaggle] Have you tried using Random Forest?
I tried various experiments.
Add P class
to the model only for the gender you created last time.
21.py
(Omitted)
#Create a Dataframe
#Gender, grade
train_df = train_df.loc[:,['PassengerId','Survived','Sex','Pclass']]
test_df = test_df.loc[:,['PassengerId','Sex','Pclass']]
(same as above)
As a result, Public Score: 0.75598
... decreased. [Wikipedia of Titanic \ (passenger ship )](https://ja.wikipedia.org/wiki/%E3%82%BF%E3%82%A4%E3%82%BF%E3%83%8B%E3% As far as 83% 83% E3% 82% AF_ (% E5% AE% A2% E8% 88% B9)) is concerned, it is strange that the mortality rate should vary greatly from grade to grade.
Of the survivors and deaths, most of the passengers who used the third cabin were dead. The third-class cabin is divided into front and rear at the bottom, and when it sinks, the passengers in the front cabin either go straight up or cut through the hull and move backward to escape. Then there were two ways to go straight up. However, there is a theory that the door of the former was locked because there was a first-class cabin directly above it, and the only method of the latter was the cause of the increase in deaths.
(Memo) If the third cabin is divided into a front cabin and a rear cabin, it seems that we can make a prediction.
22.py
(Omitted)
#Create a Dataframe
#grade
train_df = train_df.loc[:,['PassengerId','Survived','Pclass']]
test_df = test_df.loc[:,['PassengerId','Pclass']]
(same as above)
Public Score:0.65550
It went down further. Perhaps, like the gender-only model, the grade-only model is grouped into 0 or 1. I will check it.
23.py
print(train_df.groupby(['Pclass','Survived']).count())
PassengerId
Pclass Survived
1 0 80
1 136
2 0 97
1 87
3 0 372
1 119
24.py
##Predicted results for confirmation(submission)Add a class to.
submission['Pclass'] = test_df['Pclass']
print(submission.groupby(['Pclass','Survived']).count())
PassengerId
Pclass Survived
1 1 107
2 0 93
3 0 218
In the training data, there were 0s and 1s in the class Test data (prediction results) are summarized as 0 or 1.
25.py
print(train_df.groupby(['Sex','Pclass','Survived']).count())
PassengerId
Sex Pclass Survived
0 1 0 77
1 45
2 0 91
1 17
3 0 300
1 47
1 1 0 3
1 91
2 0 6
1 70
3 0 72
1 72
26.py
#Predicted results for confirmation(submission)Add gender and class to.
submission['Sex'] = test_df['Sex']
submission['Pclass'] = test_df['Pclass']
print(submission.groupby(['Sex','Pclass','Survived']).count())
PassengerId
Sex Pclass Survived
0 1 0 57
2 0 63
3 0 146
1 1 1 50
2 1 30
3 0 72
The variability that was in the training data is summarized in the test data. Random forest seems to be put together in the larger one.
Recommended Posts