This is an explanation of the means to improve the F1 score by introducing a two-step learning model in a binary judgment device that judges input data as binary values of 0 and 1. </ b>
Suppose you want to build a binary judgment device that judges input data as binary values of 0 and 1. The binary judgment device model gives training data for training.
Here, after training data is given as shown below and the determination device 1 is modeled, the output result of the determination device 1 is added to the training data to model the determination device 2. Judgment device 2 has a smaller number of false positives than judgment device 1, so the F1 score is improved.
sample.py
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train / 255.0 #Perform normalization by dividing by 255.
x_test = x_test / 255.0 #Perform normalization by dividing by 255.
x_train is a 28x28bit handwritten character displayed with 0,1. y_train is a number represented by handwritten characters.
Execution result </ b> x_train size-> (60000, 28, 28) x_train is a 28x28bit handwritten character displayed with 0,1
y_train-> Numbers represented by handwritten characters. (Size 60000) [5 0 4 ... 5 6 8]
sample.py
# Change these params if you want to change the numbers selected
num1 = 3
num2 = 5
# Subset on only two numbers: x_In the train data, y_train=Take out the one that corresponds to 3 or 5.
x_sub_train = x_train[(y_train == num1) | (y_train == num2)]
y_sub_train = y_train[(y_train == num1) | (y_train == num2)]
# Subset on only two numbers: x_In the test data, y_test=Take out the one that corresponds to 3 or 5.
x_sub_test = x_test[(y_test == num1) | (y_test == num2)]
y_sub_test = y_test[(y_test == num1) | (y_test == num2)]
sample.py
#3D data(11552,28,28)2D data(11552,28*28)Convert to.
x_train_flat = x_sub_train.flatten().reshape(x_sub_train.shape[0], 28*28)
#3D data(1902,28,28)2D data(1902,28*28)Convert to.
x_test_flat = x_sub_test.flatten().reshape(x_sub_test.shape[0], 28*28)
# One hot encode target variables
#y_sub_When the element of train is 3->Returns 1. to_1 by categorical->[0,1]Convert to.
#y_sub_When the element of train is 5->Returns 0. to_0 by categorical->[1,0]Convert to.
y_sub_train_encoded = to_categorical([1 if value == num1 else 0 for value in y_sub_train])
#Divide the data group into training data and test data.
X_train, X_val, Y_train, Y_val = train_test_split(x_train_flat, y_sub_train_encoded, test_size = 0.1, random_state=42)
Build the first learning model. The training model is built using the neural network of the Keras library.
sample.py
# Build primary model
model = Sequential()
model.add(Dense(units=2, activation='softmax'))
#units ・ ・ ・ Number of outputs
#activation ・ ・ ・ Activation function.(https://keras.io/ja/activations/#relu)
#Specify the loss function. Here, categorical_crossentropy
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x=X_train, y=Y_train, validation_data=(X_val, Y_val), epochs=3, batch_size=320) # batch size is so large so that the model can be poorly fit, Its easy to get 99% accuracy.
#The argument epochs is x_Specify the number of times to relearn the block with all the input data of train as one block.
#batch_size is x_It is given when the train is subdivided. One set divided into small pieces is called a "sub-batch". This is to prevent "overfitting"
(Reference information) http://marupeke296.com/IKDADV_DL_No2_Keras.html
Build a neural network model and draw a ROC curve.
sample.py
# Plot ROC
print('X_train','\n',X_train,len(X_train)) #length:10396
prediction = model.predict(X_train) #prediction:Neural network output
print('prediction','\n',prediction,len(prediction))#length:10396 [Probability of 3,Probability of 5]Lined up in
prediction = np.array([i[1] for i in prediction]) #You have a probability of 5.
print('prediction','\n',prediction,len(prediction))#length:10396
print('Y_train','\n',Y_train) #[0,1] or [1,0]
actual = np.array([i[1] for i in Y_train]) == 1
plot_roc(actual, prediction)
def plot_roc(actual, prediction):
# Calculate ROC / AUC
fpr, tpr, thresholds = sk_metrics.roc_curve(actual, prediction, pos_label=1)
roc_auc = sk_metrics.auc(fpr, tpr)
# Plot
plt.plot(fpr, tpr, color='darkorange',
lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic Example')
plt.legend(loc="lower right")
plt.show()
Since the ROC curve is drawn in the area above half, it can be seen that a good binary classification machine learning model can be constructed.
sample.py
# Create a model with high recall, change the threshold until a good recall level is reached
threshold = .30
print(prediction) #You have a probability of 5.
prediction_int = np.array(prediction) > threshold #prediction_int -> [False,True,.....]
print("prediction_int",prediction_int)
# Classification report
print(sk_metrics.classification_report(actual, prediction_int))
# Confusion matrix
cm = sk_metrics.confusion_matrix(actual, prediction_int)
print('Confusion Matrix')
print(cm)
・ Output of 1st model + X_Train → Train data input for 2nd model construction ・ Output of the 1st model & Y_Train → Train data output for building the 2nd model.
Increase the F1 score by excluding false positives after most positive cases have already been identified by the primary model. In other words, the role of the secondary machine learning algorithm is to determine whether the positive judgment by the primary model is true or false.
sample.py
# Get meta labels
meta_labels = prediction_int & actual
print("prediction_int",prediction_int) #[False True True ...]
print("meta_labels",meta_labels) #[False True True ...]
meta_labels_encoded = to_categorical(meta_labels) #[1,0] [0,1] [0,1],....
print(meta_labels_encoded)
# Reshape data
prediction_int = prediction_int.reshape((-1, 1))#[1,0]->[False], [0,1]->[True]Convert to
print("prediction_int",prediction_int) #[False],[True],[True],....
print("X_train", X_train) #28*28 [0,0,....0]
#concatenate concatenates arrays
# MNIST data + forecasts_int
new_features = np.concatenate((prediction_int, X_train), axis=1)
print("new_features",new_features ) #[1. 0. 0. ... 0. 0. 0.],....
# Train a new model
# Build model
meta_model = Sequential()
meta_model.add(Dense(units=2, activation='softmax'))
meta_model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
#new_features=MNIST data + forecasts_int -> [1. 0. 0. ... 0. 0. 0.],[1. 0. 0. ... 0. 0. 0.],・ ・ ・
#meta_labels_encoded =[1,0] [0,1] [0,1],....
# x_train and y_train are Numpy arrays --just like in the Scikit-Learn API.
meta_model.fit(x=new_features, y=meta_labels_encoded, epochs=4, batch_size=32)
X_Train data was put into the first learning model (neural network) and the second learning model (neural network) to obtain prediction data. I compared them with Y_Train and output a Classfication report. It was found that the accuracy of the second learning model (neural network) is improved compared to that of the first learning model (neural network).
sample.py
test_meta_label(primary_model=model, secondary_model=meta_model, x=X_train, y=Y_train, threshold=threshold)
def test_meta_label(primary_model, secondary_model, x, y, threshold):
"""
:param primary_model: model object (First, we build a model that achieves high recall, even if the precision is not particularly high)
:param secondary_model: model object (the role of the secondary ML algorithm is to determine whether a positive from the primary (exogenous) model
is true or false. It is not its purpose to come up with a betting opportunity. Its purpose is to determine whether
we should act or pass on the opportunity that has been presented.)
:param x: Explanatory variables
:param y: Target variable (One hot encoded)
:param threshold: The confidence threshold. This is used
:return: Print the classification report for both the base model and the meta model.
"""
# Get the actual labels (y) from the encoded y labels
actual = np.array([i[1] for i in y]) == 1
# Use primary model to score the data x
primary_prediction = primary_model.predict(x)
primary_prediction = np.array([i[1] for i in primary_prediction]).reshape((-1, 1))
primary_prediction_int = primary_prediction > threshold # binary labels
# Print output for base model
print('Base Model Metrics:')
print(sk_metrics.classification_report(actual, primary_prediction > 0.50))
print('Confusion Matrix')
print(sk_metrics.confusion_matrix(actual, primary_prediction_int))
accuracy = (actual == primary_prediction_int.flatten()).sum() / actual.shape[0]
print('Accuracy: ', round(accuracy, 4))
print('')
# Secondary model
new_features = np.concatenate((primary_prediction_int, x), axis=1)
# Use secondary model to score the new features
meta_prediction = secondary_model.predict(new_features)
meta_prediction = np.array([i[1] for i in meta_prediction])
meta_prediction_int = meta_prediction > 0.5 # binary labels
# Now combine primary and secondary model in a final prediction
final_prediction = (meta_prediction_int & primary_prediction_int.flatten())
# Print output for meta model
print('Meta Label Metrics: ')
print(sk_metrics.classification_report(actual, final_prediction))
print('Confusion Matrix')
print(sk_metrics.confusion_matrix(actual, final_prediction))
accuracy = (actual == final_prediction).sum() / actual.shape[0]
print('Accuracy: ', round(accuracy, 4))
It was found that the accuracy of the second neural network was improved even when the actual test data was used instead of the training data.
sample.py
test_meta_label(primary_model=model, secondary_model=meta_model, x=X_val, y=Y_val, threshold=threshold)
Recommended Posts