Click here for Keras lottery starting from nothing [http://qiita.com/Ishotihadus/items/6ecf5684c2cbaaa6a5ef)
Last time made a rough data set and trained it roughly. I used to use a simple Sequential model, but I used the Functional API, which gives me a more flexible and layered feel.
The input is 5 dimensions and each element is greater than or equal to 0 and less than 1. The output is one-dimensional, -1 if the sum of the input elements is 2.5 or less, 1 if greater than 2.5. So-called "two-class classification" was done.
The input layer is 5 dimensions. The hidden layer is 20-dimensional, and the activation function is tanh. The output layer is one-dimensional, and the activation function is also tanh.
The hinge loss was used as the learning loss function.
Until now, I learned in the atmosphere and estimated in the atmosphere. (!)
However, it is a little troublesome to implement the learning method. It's true that calculating and iterating derivatives should be implemented, but since this is a god named Keras (actually TensorFlow or Theano god inside), let's ignore it for a moment.
The goal of this time is to be able to at least estimate from the parameters.
As I say many times, each layer performs three operations: "weighting the input", "biasing", and "applying an activation function (if any)". Let's do these 3 operations by hand and check that it works properly.
If the model is too large, manual calculation will be troublesome, so make it smaller. Input is two-dimensional (each element is greater than or equal to 0 and less than 1), and output is -1 if the sum of the inputs is 0.5 or less, and 1 if greater than 0.5. The hidden layer is five-dimensional, and the other conditions are the same as before.
This time I want to know the value, so it is better to do it in an interactive environment.
Unlike the last time, the layer is treated as it is, not the tensor (the state where the input is given to the layer). It returns its tensor as you type into the model.
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
data = np.random.rand(250,2)
labels = (np.sum(data, axis=1) > 0.5) * 2 - 1
input = Input(shape=(2,))
hidden = Dense(5, activation='tanh')
output = Dense(1, activation='tanh')
model = Model(input=input, output=output(hidden(input)))
model.compile('adam', 'hinge', metrics=['accuracy'])
model.fit(data, labels, nb_epoch=150, validation_split=0.2)
Now, let's know the weight. Weights can be obtained by get_weights ()
on the layers.
hidden.get_weights()
The output looks like this: The first array is the weight vector, the second is the bias.
[array([[-1.08239257, 0.32482854, 0.95010394, 0.00501535, -0.47380614],
[-0.56682748, 1.15749049, 0.91618514, 0.37518814, -0.67639047]], dtype=float32),
array([-0.18290569, 0.21453567, 0.01353107, 0.27740911, 0.09604219], dtype=float32)]
The same weight is obtained for ʻoutput` as follows.
[array([[-0.8775745 ],
[ 1.09351909],
[ 0.21981503],
[ 1.31380796],
[-0.10301871]], dtype=float32),
array([ 0.27410847], dtype=float32)]
Now, let's actually calculate.
Input is $ \ boldsymbol {x} $ (two-dimensional vertical vector).
At this time, from the above output, the output $ \ boldsymbol {h} $ of the hidden layer is as follows (tanh is applied to the element, $ {} ^ \ top $ is the transpose).
\boldsymbol{h} = \tanh\left(
\begin{bmatrix}
-1.08239257 & 0.32482854 & 0.95010394 & 0.00501535 & -0.47380614 \\
-0.56682748 & 1.15749049 & 0.91618514 & 0.37518814 & -0.67639047
\end{bmatrix}^\top\boldsymbol{x} + \begin{bmatrix}
-0.18290569 \\ 0.21453567 \\ 0.01353107 \\ 0.27740911 \\ 0.09604219
\end{bmatrix}\right)
And the output $ \ boldsymbol {y} $ of the output layer can be calculated as follows.
\boldsymbol{y} = \tanh\left(
\begin{bmatrix}
-0.8775745 \\ 1.09351909 \\ 0.21981503 \\ 1.31380796 \\ -0.10301871
\end{bmatrix}^\top\boldsymbol{h} + 0.27410847\right)
Let's calculate manually based on this. here
\boldsymbol{x} = \begin{bmatrix}0.3 \\ 0.1\end{bmatrix}
Try as. From here, the number of digits after the decimal point is slightly reduced.
\begin{array}{rl}
\boldsymbol{h} &= \tanh\left(
\begin{bmatrix}
-1.0824 & 0.3248 & 0.9501 & 0.0050 & -0.4738 \\
-0.5668 & 1.1575 & 0.9162 & 0.3752 & -0.6764
\end{bmatrix}^\top\begin{bmatrix}0.3 \\ 0.1\end{bmatrix} + \begin{bmatrix}
-0.1829 \\ 0.2145 \\ 0.0135 \\ 0.2774 \\ 0.0960
\end{bmatrix}\right) \\
&= \tanh\left(
\begin{bmatrix}-0.3814 \\ 0.2132 \\ 0.3766 \\ 0.0390 \\-0.2098\end{bmatrix} + \begin{bmatrix}
-0.1829 \\ 0.2145 \\ 0.0135 \\ 0.2774 \\ 0.0960
\end{bmatrix}
\right) \\
&= \begin{bmatrix}-0.5112 \\ 0.4034 \\ 0.3715 \\ 0.3063 \\ -0.1133\end{bmatrix}
\\\\
\boldsymbol{y} &= \tanh\left(
\begin{bmatrix}
-0.8776 \\ 1.0935 \\ 0.2198 \\ 1.3138 \\ -0.1030
\end{bmatrix}^\top
\begin{bmatrix}-0.5112 \\ 0.4034 \\ 0.3715 \\ 0.3063 \\ -0.1133\end{bmatrix}
+ 0.2741\right) \\
&= \tanh\left(1.3855 + 0.2741\right) \\
&= 0.9302
\end{array}
Hmm? 0.3 + 0.1 is 0.5 or less, but it's a positive value ...
model.predict(np.array([[0.3, 0.1]]))
Result is
array([[ 0.93015909]], dtype=float32)
So it turns out that the same estimation can be made by hand calculation (but the estimation was wrong).
It turned out that I couldn't study properly because the number of dimensions was small and the model was suitable ... (about 85%).
It seems that the calculation was done for the time being, so let's say it's okay.
If you can make a weight, you can make a model that will definitely answer correctly, so let's make it.
You just need to add both inputs with a weight of 1 and subtract -0.5, so do this with set_weights ()
.
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
data = np.random.rand(250,2)
labels = (np.sum(data, axis=1) > 0.5) * 2 - 1
input = Input(shape=(2,))
output = Dense(1, activation='tanh')
model = Model(input=input, output=output(input))
model.compile('adam', 'hinge', metrics=['accuracy'])
output.set_weights([np.array([[1.0], [1.0]]), np.array([-0.5])])
Terrible. Let's run with this.
test = np.random.rand(200, 2)
predict = np.sign(model.predict(test).flatten())
real = (np.sum(test, axis=1) > 0.5) * 2 - 1
print(sum(predict == real) / 200.0)
No. 100%.
Next time we'll be dealing with a slightly larger dataset.
Recommended Posts