Four arithmetic operations by machine learning 6 [Commercial]

商.png

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#initial value
#Number of learning
N = 1000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#Learning rate
η = [0.001, 0.001]
#η = [0.000001, 0.000001]
#Number of intermediate layers
H = len(η) - 1
#Teacher value
t = [None for _ in range(N)]
#Function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#Function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H + 1):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))

for h in range(H + 1):
    print(w[0][h])
    
#Square error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #Input value
    f_out[n][0] = np.random.uniform(-10.0, 10.0, (layer[0]))
    
    #Teacher value
    t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #Forward propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])
    f_out[n][1] = np.log(f_in[n][0]*f_in[n][0])
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])

    #Output value
    div = np.exp(f_in[n][1])

    #Square error
    dE[n] = div - t[n]#Value after differentiation of squared error due to omission of calculation

    #δ
    δ[n][1] = div * dE[n]
    δ[n][0] = (2.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1])
    
    #Backpropagation
    for h in range(H + 1):
        w[n + 1][h] = w[n][h] - η[h] * np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        

#output
#value
for h in range(H + 1):
    print(w[N][h])
#Figure
#Area vertical
py = np.amax(layer)
#Next to the area
px = (H + 1) * 2
#Area dimensions
plt.figure(figsize = (16, 9))
#Horizontal axis in the figure
x = np.arange(0, N + 1, 1) #0 to N+Up to 1 in 1 increments
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #Area coordinates
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #Lattice line
        plt.grid(True)
        #Usage Guide
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#Save
plt.savefig('graph_div.png') 
#Illustrated
plt.show()
I thought about the circuit of division in the form of deep learning.\\
 \\
Weight\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix},
w[1]=
\begin{pmatrix}
〇 & ●
\end{pmatrix}\\
\\
And\\
 \\
Input value and w[0]Product of\\
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}\\
=
\begin{pmatrix}
△a+□b\\
▲a+■b
\end{pmatrix}\\
 \\
1st layer input\\
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}\\
 \\
Antilogarithms are squared to accommodate negative numbers.\\
 \\
1st layer output and w[1]Product of\\
\begin{align}
\begin{pmatrix}
〇 & ●
\end{pmatrix}
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}
=&〇log(△a+□b)^2+●log(▲a+■b)^2\\
=&log(△a+□b)^{2〇}-log(▲a+■b)^{-2●}\\
=&log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
\end{align}\\
 \\
Output layer input\\
e^{log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}}=\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
 \\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
 \\
\frac{a}{b}\\
 \\
In the simplest case, if the above conditions are met, the quotient a/You can output b.\\
However, it is actually more complicated because the generalized binomial theorem is relevant.

[Exhibition] Generalization binomial theorem and approximation of routes, etc.


The initial value is a random number(-1.0~1.0)After deciding in, I tried to see if it would converge to the target value if the learning was repeated.\\
 \\
Target value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
 \\
initial value\\
w[0]=
\begin{pmatrix}
-0.18845444 & -0.56031414\\
-0.48188658 & 0.6470921
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.80395641 & 0.80365676
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.18845444,□=-0.56031414,〇=0.80395641 \\
▲=-0.48188658,■=0.6470921,●=0.80365676
\end{array}
\right.\\
 \\
Calculated value\\
w[0]=
\begin{pmatrix}
14601870.60282903 & -14866110.02378938\\
13556781.27758209 & -13802110.45958244
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-1522732.53915774 & -6080851.59710287
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=14601870.60282903,□=-14866110.02378938,〇=-1522732.53915774 \\
▲=13556781.27758209,■=-13802110.45958244,●=-6080851.59710287
\end{array}
\right.\\

graph_div.png


It's a failure. No matter how many times you do it, the weight will diverge to a ridiculous value.\\
I searched for the cause.\\
With the chain rule of error back propagation\\
(log(x^2))'=\frac{2}{x}\\
\lim_{x \to ±∞} \frac{2}{x}=0\\
 \\
(e^x)'=e^x\\
\lim_{x \to -∞} e^x=0\\
It turned out that taking such an extremely large value causes the gradient to disappear.\\
 \\
I reconsidered.

商ver2.png

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#initial value
#Number of learning
N = 200000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#Learning rate
η = [0.1, 0.1]
#η = [0.000001, 0.000001]
#Cutting price
#clip = 709
clip = 700
#Number of intermediate layers
H = len(η) - 1
#Teacher value
t = [None for _ in range(N)]
#Function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#Function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))
w[0][H] = np.zeros((layer[H + 1], layer[H]))

for h in range(H + 1):
    print(w[0][h])
    
#Square error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #Input value
    t[n] = clip
    while np.abs(t[n]) > np.log(np.log(clip)):#Vanishing gradient problem countermeasures
        f_out[n][0] = np.random.uniform(0.0, 10.0, (layer[0]))
        f_out[n][0] = np.array(f_out[n][0], dtype=np.complex)
    
        #Teacher value
        t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #Forward propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])    
    f_out[n][1] = np.log(f_in[n][0])    
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])
    
    #Output value
    div = np.exp(f_in[n][1])
    
    #Square error
    dE[n] = np.real(div - t[n])#Value after differentiation of squared error due to omission of calculation
    dE[n] = np.clip(dE[n], -clip, clip)
    dE[n] = np.nan_to_num(dE[n])

    #δ
    δ[n][1] = np.real(div * dE[n])
    δ[n][1] = np.clip(δ[n][1], -clip, clip)
    δ[n][1] = np.nan_to_num(δ[n][1])
    
    
    δ[n][0] = np.real((1.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1]))
    δ[n][0] = np.clip(δ[n][0], -clip, clip)  
    δ[n][0] = np.nan_to_num(δ[n][0]) 
    
    #Backpropagation
    for h in range(H + 1):
        #Vanishing gradient problem countermeasures
        # a*10^Only the a part of b
        w10_u = np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        w10_u = np.clip(w10_u, -clip, clip)  
        w10_u = np.nan_to_num(w10_u)        
        w10_d = np.where(
            w10_u != 0.0,
            np.modf(np.log10(np.abs(w10_u)))[1],
            0.0
        )
        #Not supported for decimal numbers
        w10_d = np.clip(w10_d, 0.0, clip)
        
        w[n + 1][h] = w[n][h] - η[h] * (w10_u / np.power(10.0, w10_d))

#output
#value
for h in range(H + 1):
    print(w[N][h])
#Figure
#Area vertical
py = np.amax(layer)
#Next to the area
px = (H + 1) * 2
#Area dimensions
plt.figure(figsize = (16, 9))
#Horizontal axis in the figure
x = np.arange(0, N + 1, 1) #0 to N+Up to 1 in 1 increments
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #Area coordinates
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #Lattice line
        plt.grid(True)
        #Usage Guide
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#Save
plt.savefig('graph_div.png') 
#Illustrated
plt.show()

As a countermeasure -Set the input value to a complex number. -Only data that does not easily overflow with the teacher value. ・ Do not set δ to a value larger than a certain value. ・ Set the gradient only to the a part of a * 10 ^ b so that the weight does not diverge. (Only when b is a positive number)

graph_div.png


Target value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=1 \\
▲=0,■=1,●=-1
\end{array}
\right.\\
 \\
initial value\\
w[0]=
\begin{pmatrix}
-0.12716087 & 0.34977234\\
0.85436489 & 0.65970844
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.0 & 0.0
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.12716087,□=0.34977234,〇=0.0 \\
▲=0.85436489,■=0.65970844,●=0.0
\end{array}
\right.\\
 \\
Calculated value\\
w[0]=
\begin{pmatrix}
-1.71228449e-08 & 1.00525062e+00\\
1.00525061e+00 & -4.72288257e-09
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-0.99999998 & 0.99999998
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-1.71228449e-08,□=1.00525062e+00,〇=-0.99999998\\
▲=1.00525061e+00,■=-4.72288257e-09,●=0.99999998
\end{array}
\right.\\
 \\
Succeeded. The values of △ □ and ▲ ■ are reversed.\\
I don't like it in a way that has the correct answer and is close to it.\\
Even so, log at most trying to teach division,exp,With complex numbers\\
I was in trouble because I had to extend to high school mathematics.\\

Recommended Posts

Four arithmetic operations by machine learning 6 [Commercial]
4 [/] Four Arithmetic by Machine Learning
Four arithmetic operations in python
Machine learning summary by Python beginners
Machine learning
Making Sandwichman's Tale by Machine Learning ver4
[Failure] Find Maki Horikita by machine learning
[Note] Operators related to four arithmetic operations
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Judgment of igneous rock by machine learning ②
Classification of guitar images by machine learning Part 1
[Statistics review] Four arithmetic operations of random variables
Mayungo's Python Learning Episode 5: I tried to do four arithmetic operations with numbers
Classify machine learning related information by topic model
Analysis of shared space usage by machine learning
Stock price forecast by machine learning Numerai Signals
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Reasonable price estimation of Mercari by machine learning
[Memo] Machine learning
Classification of guitar images by machine learning Part 2
Machine learning classification
A story about data analysis by machine learning
Machine Learning sample
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2
Time series data prediction by AutoML (automatic machine learning)
Machine learning tutorial summary
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning support vector machine
Machine Sommelier by Keras-
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Predict the presence or absence of infidelity by machine learning
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Distinguish t + pazolite songs by machine learning (NNC challenge development)
Stock price forecast by machine learning Let's get started Numerai