I've been playing around with the back end and front end, but I haven't tried machine learning yet. This is my first attempt, so I will record it as a memorial. I'm using python, numpy, tf.keras.

My spec

See this Qiita.
Communication system by c and go ~ backend development, frontend development by flutter / dart is possible.
I'm also touching python quite a bit.
I have touched machine learning with GUI tools.
Machine learning with python is my first challenge. A level that I have never used numpy.

To study the theory of machine learning collectively, "Deep Learning from scratch-Theory and implementation of deep learning learned with Python I read "dp / 4873117585 /)". It was a very good book.

The development environment is PyCharm Community 2019.3. I use PyCharm by loading the necessary libraries without using Anaconda.

1. Machine learning task setting

We aim to machine learn the following correct logic.

This is a supervised binary classification problem.
Set the feature amount to be input as two random values (0 or more and less than 1), and set 0 or 1 as the correct label by comparing the two magnitudes.
In rare cases, noise is mixed in the correct label. (Eliminate noise at first)

2. Code

I made a typical code for a binary classification problem while looking at some Web articles. I thought it was quite compact and intuitive to write. Keras is amazing.

#!/usr/bin/env python3

import tensorflow as tf
import numpy as np
from tensorflow_core.python.keras.metrics import binary_accuracy
import matplotlib.pyplot as plt

#Data set preparation
ds_features = np.random.rand(10000, 2)  #Feature data
NOISE_RATE = 0
ds_noise = (np.random.rand(10000) > NOISE_RATE).astype(np.int) * 2 - 1  #No noise: 1,Yes: -1
ds_labels = (np.sign(ds_features[:, 0] - ds_features[:, 1]) * ds_noise + 1) / 2  #Correct label

#Split dataset for training and validation
SPLIT_RATE = 0.8   #Split ratio
training_features, validation_features = np.split(ds_features, [int(len(ds_features) * SPLIT_RATE)])
training_labels, validation_labels = np.split(ds_labels, [int(len(ds_labels) * SPLIT_RATE)])

#Model preparation
INPUT_FEATURES = ds_features.shape[1]   #Feature dimension
LAYER1_NEURONS = int(INPUT_FEATURES * 1.2 + 1)   #A little wider than the input dimension
LAYER2_NEURONS = LAYER1_NEURONS
LAYER3_NEURONS = LAYER1_NEURONS  #3 hidden layers
OUTPUT_RESULTS = 1  #Output is one-dimensional
ACTIVATION = 'tanh'
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(input_shape=(INPUT_FEATURES,), units=LAYER1_NEURONS, activation=ACTIVATION),
    tf.keras.layers.Dense(units=LAYER2_NEURONS, activation=ACTIVATION),
    tf.keras.layers.Dense(units=LAYER3_NEURONS, activation=ACTIVATION),
    tf.keras.layers.Dense(units=OUTPUT_RESULTS, activation='sigmoid'),
])
LOSS = 'binary_crossentropy'
OPTIMIZER = tf.keras.optimizers.Adam   #Typical optimization method
LEARNING_RATE = 0.03   #Common initial values of learning coefficient
model.compile(optimizer=OPTIMIZER(lr=LEARNING_RATE), loss=LOSS, metrics=[binary_accuracy])

#Learning
BATCH_SIZE = 30
EPOCHS = 100
result = model.fit(x=training_features, y=training_labels,
                   validation_data=(validation_features, validation_labels),
                   batch_size=BATCH_SIZE, epochs=EPOCHS, verbose=1)

#display
plt.plot(range(1, EPOCHS+1), result.history['binary_accuracy'], label="training")
plt.plot(range(1, EPOCHS+1), result.history['val_binary_accuracy'], label="validation")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.ylim(0.5, 1)
plt.legend()
plt.show()

3. Result

This is the learning result. It quickly reached an accuracy of about 99%, and it seems that it has not overfitted.

4. Consideration

4.1. Behavior when noise is added

I tried setting NOISE_RATE = 0.2. The accuracy is reduced by the amount of noise, but the result is appropriate.

4.2. Behavior when irrelevant dummy features are added

Let's return the noise and increase the features to 5 types. Find the correct label with the same logic using only 2 of the 5 types. In other words, the remaining 3 types of features are dummies that have nothing to do with the correct answer.

The result is here, and although the blur width is a little larger, it can be said that you can learn without being deceived by the dummy.

4.3. Behavior when feature normalization is broken

I will return the features to 2 types, but I tried to multiply the random value of 0 or more and less than 1 by 1000. The result is that learning does not seem to converge uniformly and is less accurate near the final epoch.

I increased the epoch and checked it. After all learning seems to be unstable.

On the other hand, I shifted the average of the features and tried to increase the random value of 0 or more and less than 1 by +1000. The results show that the accuracy is almost 0.5, that is, it is not trained at all as a binary classification.

Overall, we can see that feature normalization is important.

Record of the first machine learning challenge with Keras