What is the activation function?

Introduction

I will explain the types of activation functions that appear in neural networks and what kind of functions they are.

What is the activation function?

It is a function that converts the sum of input signals into an output signal. The activation function is responsible for determining how the sum of input signals is activated and how it fires. Expressed as an expression, it looks like this. $ y = h(\sum_{i=1}^{n}x_iw_i + b) $ $ h () $: Activation function, $ \ sum_ {i = 1} ^ {n} x_iw_i + b $: Input signal, $ y $: Output signal

It looks like this in the figure.

2020-01-30 (2).png
a = x_1w_1 + x_2w_2 + b \\
y = h(a)

Step function

It is a function that switches the output at the threshold value. It is also called "step function".

Since the perceptron takes a binary value of firing (1) or not firing (0), it can be said that "perceptron uses a step function as the activation function". Normally, neural networks use another function that is not a step function as the activation function.

def step_function(x):
    if x > 0:
        return 1
    else:
        return 0

If the input is greater than 0, it will return 1, and if it is less than 0, it will return 0. I think that neural networks use Numpy arrays, so I will make them correspond to Numpy arrays.

def step_function(x):
    y = x > 0
    return y.astype(np.int)

A description of the code. An inequality sign operation on a Numpy array will generate a boolean array.

>>> x = np.array([1.0, -1.0, 2.0])
>>> y = x > 0
>>> y
>>> array([ True, False,  True])

I am converting it to an int type.

>>> y.astype(np.int)
>>> array([1, 0, 1])

The graph looks like this. download.png

Sigmoid function

h(x) = \frac{1}{1-\exp(-x)}
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

When you perform a numerical operation on a Numpy array and a scalar value, each element of the Numpy array and the scalar value are calculated, and the result of the operation is output as a Numpy array.

The graph looks like this. download.png

I think that sigmoids should be recognized as smooth step functions. Smoother is more convenient.

Meaning and simple properties of sigmoid function

ReLU function

h(x) = \left\{
\begin{array}{ll}
x & (x \gt 0) \\
0 & (x \leq 0)
\end{array} \right.

It is a function that outputs the input value as it is if the input exceeds 0, and outputs 0 if it is 0 or less. The reading is "Rectified function". The official name is "Rectified Linear Unit", which is also known as the ramp function.


def relu(x):
    return np.maximum(0, x)

maximum (): Compares each element of 0 and x and returns the larger one

The graph looks like this. download.png

softmax function

y_k = \frac{\exp(a_k)}{\sum_{i=1}^{n}\exp(a_i)}

It is often used as the activation function of the output layer. Since it is itself / whole, it can be regarded as a probability. You can see which is the most plausible in other classifications.

def softmax(a):
    exp_a = np.exp(a)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sim_exp_a

Be careful here! The exponential function grows explosively. download.png Like this. → Overflow occurred

What to do?

Subtract the maximum value in the input signal! Because the softmax function has the property that the result does not change even if some constants are added or subtracted.

def softmax(a):
    c = np.max(a) #Maximum value in the input signal
    exp_a = np.exp(a - c)
    sum_exp_a = np.sum(exp_a)
    y = exp_a / sum_exp_a
    return y

download.png Compare the y-axis.

Identity function

It is often used as the activation function of the output layer of regression. It is a function that outputs the input as it is.

download.png

in conclusion

Neural networks can be used for both regression and classification problems, but the activation function is used differently depending on which problem, so different activation functions may be used for the output layer and the intermediate layer.

Recommended Posts

What is the activation function?
What is the Callback function?
Regarding the activation function Gelu
What is the Linux kernel?
What is the interface for ...
What is a callback function?
[Python] What is a zip function?
[Python] What is @? (About the decorator)
[python] What is the sorted key?
What is the X Window System?
What is the python underscore (_) for?
What is namespace
What is copy.copy ()
What is Django? .. ..
What is dotenv?
What is POSIX?
What is Linux
What is the ETL processing framework clivoa?
What is klass?
[Unix] What is the zombie process / orphan process?
What is the cause of the following error?
What is SALOME?
What is "mahjong" in the Python library? ??
What is Linux?
What is python
What is hyperopt?
What is Linux
[Machine learning] What is the LP norm?
What is pyvenv
What is __call__
What is Linux
What is Python
What is the difference between `pip` and` conda`?
What is wheezy in the Docker Python image?
Why the activation function must be a non-linear function
It's a Mac. What is the Linux command Linux?
(Linux beginner) What is the magic word aux?
I want to use the activation function Mish
What is the difference between Unix and Linux?
What is a distribution?
What is Piotroski's F-Score?
What is Raspberry Pi?
[Python] What is Pipeline ...
What is Calmar Ratio?
What is a terminal?
[PyTorch Tutorial ①] What is PyTorch?
What is hyperparameter tuning?
What is a hacker?
The first GOLD "Function"
What is JSON? .. [Note]
About the Unfold function
What is Linux for?
What is a pointer?
What is ensemble learning?
What is TCP / IP?
What is Python's __init__.py?
What is an iterator?
What is UNIT-V Linux?
[Python] What is virtualenv
What is machine learning?
What is the difference between usleep, nanosleep and clock_nanosleep?