It's basically a memorandum of Deep Learning content made from O'Reilly's zero. I have a python tag, but I don't see much python code.

perceptron

To put it simply, a logic circuit

AND gate

A gate that operates as follows

x_1	x_2	y
0	0	0
0	1	0
1	0	0
1	1	1

Change the weight (w_1, w_2) $ of $ b + w_1x_1 + w_2x_2 to realize the above gate Same for other gates

Multilayer perceptron

XOR gate cannot be reproduced with 1-layer perceptron

XOR gate

x_1	x_2	y
0	0	0
0	1	1
1	0	1
1	1	0

I won't go into detail, but this cannot be achieved with a single perceptron. This can be achieved by stacking layers (multilayer perceptron) This is the basis of neural networks

neural network

The basic idea is with Perceptron

Then what's the difference?

Perceptron output 1 if the output of the function mentioned earlier exceeds 0, otherwise it outputs 0. In other words, the output of one layer is output to the next layer using a step function. The step function here is called the ** activation function **.

This activation function replaces in neural networks

Sigmoid function

h(x)=\frac{1}{1+exp(-x)}

A function is a function that returns some output when a certain input is given. The sigmoid function is also just a function

What is better than a step function?

Very simply, the shape of the function is smooth

ReLU function

I haven't done it yet

Simple neural network

If the input is $ X $, the weight of the first layer is $ W $, and the bias of the first layer is $ B $, The weighted sum of the first layer can be expressed as $ A = XW + B $. Give this $ A $ vector as the input of the sigmoid function and give its output as the next input The output of the last layer uses another function instead of the sigmoid function

def sigmoid(x):
  return 1/(1+np.exp(-x))

def init_network():
  network = {}
  network['W1'] = np.array([[0.1, 0.3, 0.5],[0.2, 0.4, 0.6]])
  network['B1'] = np.array([0.1, 0.2, 0.3])
  # ...Initialize the other layers as well
  return network

def forward(network, x):
  W1 = network['W1']
  B1 = network['B1']
  a1 = np.dot(x, W1)
  z1 = sigmoid(a1) 
  # ...Continue
  y = identity_function(a3)
  return y

#I think it will be object-oriented in the future
network = init_network()
x = np.array([1.0, 2.0])
y = forward(network, x)

Output layer

Use identity function or softmax function The identity function is a function that outputs the input as it is The softmax function is expressed by the following formula

y_k = \frac{exp(a_k)}{\sum_{i=1}^{n}exp(a_i)}

However, when implementing the softmax function programmatically, you have to be careful about overflow, so Subtract the maximum value of the vector from the value of a.

What is a softmax function?

It converts the output of the neural network into the probability of classifying it into a certain class. However, since the magnitude relationship of the output does not change, it is generally omitted in the inference phase. The softmax function is used in the learning phase

Deep Learning Memorandum