4 [/] Four Arithmetic by Machine Learning

english_la/b.png

I thought about the perceptron of division in the form of deep learning. I considered a division perceptron in the form of deep learning.

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#Initial value
#number of learning
N = 1000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.001, 0.001]
#η = [0.000001, 0.000001]
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H + 1):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))

for h in range(H + 1):
    print(w[0][h])
    
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #input value
    f_out[n][0] = np.random.uniform(-10.0, 10.0, (layer[0]))
    
    #teacher value
    t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #order propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])
    f_out[n][1] = np.log(f_in[n][0]*f_in[n][0])
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])

    #output value
    div = np.exp(f_in[n][1])

    #squared error
    dE[n] = div - t[n]#value after squared error differentiation due to omission of calculation

    #δ
    δ[n][1] = div * dE[n]
    δ[n][0] = (2.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1])
    
    #back propagation
    for h in range(H + 1):
        w[n + 1][h] = w[n][h] - η[h] * np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        

#Output
#Weight
for h in range(H + 1):
    print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1)
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #area matrix
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #grid line
        plt.grid(True)
        #legend
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#save
plt.savefig('graph_div.png') 
#show
plt.show()

Shows the concept of weights.\\
I\ indicate\ the\ concept\ of\ weight.\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix},
w[1]=
\begin{pmatrix}
〇 & ●
\end{pmatrix}\\
 \\
Input value and w[0]Product of\\
multiplication\ of\ input\ value\ and\ w[0]\\
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}\\
=
\begin{pmatrix}
△a+□b\\
▲a+■b
\end{pmatrix}\\
 \\
1st layer input\\
enter\ in\ the\ first\ layer\\
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}\\
 \\
Antilogarithms are squared to accommodate negative numbers.\\
The\ exact\ number\ is\ squared\ to\ accommodate\ negative\ numbers.\\
 \\
1st layer output and w[1]Product of\\
product\ of\ first\ layer\ output\ and\ w[1]\\
\begin{align}
\begin{pmatrix}
〇 & ●
\end{pmatrix}
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}
=&〇log(△a+□b)^2+●log(▲a+■b)^2\\
=&log(△a+□b)^{2〇}-log(▲a+■b)^{-2●}\\
=&log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
\end{align}\\
 \\
Output layer input\\
enter\ in\ the\ output\ layer\\
e^{log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}}=\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
 \\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
 \\
\frac{a}{b}\\
 \\
In the simplest case, if the above conditions are met, the quotient a/You can output b.\\
In\ the\ simplest\ case,\ if\ the\ above\ conditions\ are\ met,\ quotient\ a/b\ can\ be\ output.\\


The initial value is a random number(-1.0~1.0)After deciding in, I tried to see if it would converge to the target value if the learning was repeated.\\
After\ deciding\ the\ initial\ value\ between\ random\ numbers\ (-1.0~1.0),\\
I\ tried\ to\ repeat\ the\ learning\ to\ converge\ to\ the\ target\ value.\\
 \\
Target value\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
 \\
initial value\\
Initial\ value\\
w[0]=
\begin{pmatrix}
-0.18845444 & -0.56031414\\
-0.48188658 & 0.6470921
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.80395641 & 0.80365676
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.18845444,□=-0.56031414,〇=0.80395641 \\
▲=-0.48188658,■=0.6470921,●=0.80365676
\end{array}
\right.\\
 \\
Calculated value\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
14601870.60282903 & -14866110.02378938\\
13556781.27758209 & -13802110.45958244
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-1522732.53915774 & -6080851.59710287
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=14601870.60282903,□=-14866110.02378938,〇=-1522732.53915774 \\
▲=13556781.27758209,■=-13802110.45958244,●=-6080851.59710287
\end{array}
\right.\\

graph_div.png


It's a failure. No matter how many times you do it, the weight will diverge to a ridiculous value.\\
I searched for the cause.\\
It\ is\ a\ failure.\\
No\ matter\ how\ many\ times\ I\ do,\ the\ weights\ will\ diverge\ to\ ridiculous\ values.\\
I\ investigated\ the\ cause.\\
 \\
With the chain rule of error back propagation\\
In\ chain\ rule\ of\ the\ backpropagation\\
(log(x^2))'=\frac{2}{x}\\
\lim_{x \to ±∞} \frac{2}{x}=0\\
 \\
(e^x)'=e^x\\
\lim_{x \to -∞} e^x=0\\
It turned out that taking such an extremely large value causes the gradient to disappear.\\
It\ was\ found\ that\ such\ an\ extremely\ large\ value\ would\ cause\ the\ gradient\ to\ disappear.\\
 \\
I reconsidered.\\
I reconsidered.

商ver2.png

# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt

#Initial value
#number of learning
N = 200000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.1, 0.1]
#η = [0.000001, 0.000001]
#clip value
#clip = 709
clip = 700
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H):   
    w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))
w[0][H] = np.zeros((layer[H + 1], layer[H]))

for h in range(H + 1):
    print(w[0][h])
    
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]

#Learning
for n in range(N):

    #input value
    t[n] = clip
    while np.abs(t[n]) > np.log(np.log(clip)):#Gradient vanishing problem Measure
        f_out[n][0] = np.random.uniform(0.0, 10.0, (layer[0]))
        f_out[n][0] = np.array(f_out[n][0], dtype=np.complex)
    
        #teacher value
        t[n] = f_out[n][0][0] / f_out[n][0][1]
    
    #order propagation
    f_in[n][0] = np.dot(w[n][0], f_out[n][0])    
    f_out[n][1] = np.log(f_in[n][0])    
    f_in[n][1] = np.dot(w[n][1], f_out[n][1])
    
    #output value
    div = np.exp(f_in[n][1])
    
    #squared error
    dE[n] = np.real(div - t[n])#value after squared error differentiation due to omission of calculation
    dE[n] = np.clip(dE[n], -clip, clip)
    dE[n] = np.nan_to_num(dE[n])

    #δ
    δ[n][1] = np.real(div * dE[n])
    δ[n][1] = np.clip(δ[n][1], -clip, clip)
    δ[n][1] = np.nan_to_num(δ[n][1])
    
    
    δ[n][0] = np.real((1.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1]))
    δ[n][0] = np.clip(δ[n][0], -clip, clip)  
    δ[n][0] = np.nan_to_num(δ[n][0]) 
    
    #back propagation
    for h in range(H + 1):
        #Gradient vanishing problem Measure
        # a*10^b a part only
        w10_u = np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
        w10_u = np.clip(w10_u, -clip, clip)  
        w10_u = np.nan_to_num(w10_u)        
        w10_d = np.where(
            w10_u != 0.0,
            np.modf(np.log10(np.abs(w10_u)))[1],
            0.0
        )
        #Decimal not supported
        w10_d = np.clip(w10_d, 0.0, clip)
        
        w[n + 1][h] = w[n][h] - η[h] * (w10_u / np.power(10.0, w10_d))

#Output
#Weight
for h in range(H + 1):
    print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1) #0 to N+Up to 1 in 1 increments
#drawing
for h in range(H + 1):
    for l in range(layer[h + 1]):
        #area matrix
        plt.subplot(py, px, px * l + h * 2 + 1)
        for m in range(layer[h]):                       
            #line
            plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")        
        #grid line
        plt.grid(True)
        #legend
        plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)

#save
plt.savefig('graph_div.png') 
#show
plt.show()

As a countermeasure -Set the input value to a complex number. -Only data that does not easily overflow with the teacher value. ・ Do not set δ to a value larger than a certain value. ・ Set the gradient only to the a part of a * 10 ^ b so that the weight does not diverge. (Only when b is a positive number) As a countermeasure ・ Change the input value to a complex number. ・ Use only data that is difficult to overflow with the teacher value. ・ Do not make δ larger than a certain value. ・ Set the gradient to only a part of a * 10 ^ b so that the weight does not diverge. (Only when b is a positive number)

graph_div.png


Target value\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=1 \\
▲=0,■=1,●=-1
\end{array}
\right.\\
 \\
initial value\\
Initial value\\
w[0]=
\begin{pmatrix}
-0.12716087 & 0.34977234\\
0.85436489 & 0.65970844
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.0 & 0.0
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.12716087,□=0.34977234,〇=0.0 \\
▲=0.85436489,■=0.65970844,●=0.0
\end{array}
\right.\\
 \\
Calculated value\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
-1.71228449e-08 & 1.00525062e+00\\
1.00525061e+00 & -4.72288257e-09
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-0.99999998 & 0.99999998
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-1.71228449e-08,□=1.00525062e+00,〇=-0.99999998\\
▲=1.00525061e+00,■=-4.72288257e-09,●=0.99999998
\end{array}
\right.\\
 \\
Succeeded. The values of △ □ and ▲ ■ are reversed.\\
I don't like it in a way that has the correct answer and is close to it.\\
Even so, log at most trying to teach division,exp,With complex numbers\\
I was in trouble because I had to extend to high school mathematics.\\
 \\
Succeeded.\ The\ values\ of\ △□\ and\ ▲■\ are\ reversed.\\
I\ don't\ like\ it\ in\ the\ way\ that\ I\ get\ it\ right.\\
Even\ so,\ at\ the\ very\ least\ trying\ to\ teach\ division\\
log,\ exp,\ complex\ numbers\\
I\ had\ trouble\ expanding\ to\ high\ school\ mathematics.\\

Recommended Posts

4 [/] Four Arithmetic by Machine Learning
Machine learning
Machine learning summary by Python beginners
Making Sandwichman's Tale by Machine Learning ver4
[Failure] Find Maki Horikita by machine learning
[Memo] Machine learning
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Judgment of igneous rock by machine learning ②
Machine Learning sample
Classification of guitar images by machine learning Part 1
Classify machine learning related information by topic model
Analysis of shared space usage by machine learning
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Reasonable price estimation of Mercari by machine learning
Classification of guitar images by machine learning Part 2
A story about data analysis by machine learning
Machine learning tutorial summary
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning support vector machine
Machine Sommelier by Keras-
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Python learning memo for machine learning by Chainer from Chapter 2
Python learning memo for machine learning by Chainer Chapters 1 and 2
Machine learning model considering maintainability
Japanese preprocessing for machine learning
Machine learning in Delemas (practice)
Four arithmetic operations in python
An introduction to machine learning
Machine learning / classification related techniques
Machine Learning: Supervised --Linear Regression
Basics of Machine Learning (Notes)
Machine learning beginners tried RBM
Predict the presence or absence of infidelity by machine learning
[Reinforcement learning] Tracking by multi-agent
[Machine learning] Understanding random forest
Machine learning with Python! Preparation
Machine learning ② Naive Bayes Summary
Understand machine learning ~ ridge regression ~.
About machine learning mixed matrices
Machine Learning: Supervised --Random Forest
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Machine learning environment construction macbook 2021
Build a machine learning environment
Python Machine Learning Programming> Keywords
Machine learning algorithm (simple perceptron)
Distinguish t + pazolite songs by machine learning (NNC challenge development)
Used in machine learning EDA
Importance of machine learning datasets