I thought about the perceptron of division in the form of deep learning. I considered a division perceptron in the form of deep learning.
# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt
#Initial value
#number of learning
N = 1000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.001, 0.001]
#η = [0.000001, 0.000001]
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H + 1):
w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))
for h in range(H + 1):
print(w[0][h])
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]
#Learning
for n in range(N):
#input value
f_out[n][0] = np.random.uniform(-10.0, 10.0, (layer[0]))
#teacher value
t[n] = f_out[n][0][0] / f_out[n][0][1]
#order propagation
f_in[n][0] = np.dot(w[n][0], f_out[n][0])
f_out[n][1] = np.log(f_in[n][0]*f_in[n][0])
f_in[n][1] = np.dot(w[n][1], f_out[n][1])
#output value
div = np.exp(f_in[n][1])
#squared error
dE[n] = div - t[n]#value after squared error differentiation due to omission of calculation
#δ
δ[n][1] = div * dE[n]
δ[n][0] = (2.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1])
#back propagation
for h in range(H + 1):
w[n + 1][h] = w[n][h] - η[h] * np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
#Output
#Weight
for h in range(H + 1):
print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1)
#drawing
for h in range(H + 1):
for l in range(layer[h + 1]):
#area matrix
plt.subplot(py, px, px * l + h * 2 + 1)
for m in range(layer[h]):
#line
plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")
#grid line
plt.grid(True)
#legend
plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)
#save
plt.savefig('graph_div.png')
#show
plt.show()
Shows the concept of weights.\\
I\ indicate\ the\ concept\ of\ weight.\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix},
w[1]=
\begin{pmatrix}
〇 & ●
\end{pmatrix}\\
\\
Input value and w[0]Product of\\
multiplication\ of\ input\ value\ and\ w[0]\\
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
\begin{pmatrix}
a\\
b
\end{pmatrix}\\
=
\begin{pmatrix}
△a+□b\\
▲a+■b
\end{pmatrix}\\
\\
1st layer input\\
enter\ in\ the\ first\ layer\\
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}\\
\\
Antilogarithms are squared to accommodate negative numbers.\\
The\ exact\ number\ is\ squared\ to\ accommodate\ negative\ numbers.\\
\\
1st layer output and w[1]Product of\\
product\ of\ first\ layer\ output\ and\ w[1]\\
\begin{align}
\begin{pmatrix}
〇 & ●
\end{pmatrix}
\begin{pmatrix}
log(△a+□b)^2\\
log(▲a+■b)^2
\end{pmatrix}
=&〇log(△a+□b)^2+●log(▲a+■b)^2\\
=&log(△a+□b)^{2〇}-log(▲a+■b)^{-2●}\\
=&log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
\end{align}\\
\\
Output layer input\\
enter\ in\ the\ output\ layer\\
e^{log\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}}=\frac{(△a+□b)^{2〇}}{(▲a+■b)^{-2●}}\\
\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
\\
\frac{a}{b}\\
\\
In the simplest case, if the above conditions are met, the quotient a/You can output b.\\
In\ the\ simplest\ case,\ if\ the\ above\ conditions\ are\ met,\ quotient\ a/b\ can\ be\ output.\\
The initial value is a random number(-1.0~1.0)After deciding in, I tried to see if it would converge to the target value if the learning was repeated.\\
After\ deciding\ the\ initial\ value\ between\ random\ numbers\ (-1.0~1.0),\\
I\ tried\ to\ repeat\ the\ learning\ to\ converge\ to\ the\ target\ value.\\
\\
Target value\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=0.5 \\
▲=0,■=1,●=-0.5
\end{array}
\right.\\
\\
initial value\\
Initial\ value\\
w[0]=
\begin{pmatrix}
-0.18845444 & -0.56031414\\
-0.48188658 & 0.6470921
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.80395641 & 0.80365676
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.18845444,□=-0.56031414,〇=0.80395641 \\
▲=-0.48188658,■=0.6470921,●=0.80365676
\end{array}
\right.\\
\\
Calculated value\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
14601870.60282903 & -14866110.02378938\\
13556781.27758209 & -13802110.45958244
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-1522732.53915774 & -6080851.59710287
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=14601870.60282903,□=-14866110.02378938,〇=-1522732.53915774 \\
▲=13556781.27758209,■=-13802110.45958244,●=-6080851.59710287
\end{array}
\right.\\
It's a failure. No matter how many times you do it, the weight will diverge to a ridiculous value.\\
I searched for the cause.\\
It\ is\ a\ failure.\\
No\ matter\ how\ many\ times\ I\ do,\ the\ weights\ will\ diverge\ to\ ridiculous\ values.\\
I\ investigated\ the\ cause.\\
\\
With the chain rule of error back propagation\\
In\ chain\ rule\ of\ the\ backpropagation\\
(log(x^2))'=\frac{2}{x}\\
\lim_{x \to ±∞} \frac{2}{x}=0\\
\\
(e^x)'=e^x\\
\lim_{x \to -∞} e^x=0\\
It turned out that taking such an extremely large value causes the gradient to disappear.\\
It\ was\ found\ that\ such\ an\ extremely\ large\ value\ would\ cause\ the\ gradient\ to\ disappear.\\
\\
I reconsidered.\\
I reconsidered.
# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt
#Initial value
#number of learning
N = 200000
#layer
layer = [2, 2, 1]
#bias
#bias = [0.0, 0.0]
#learning rate
η = [0.1, 0.1]
#η = [0.000001, 0.000001]
#clip value
#clip = 709
clip = 700
#number of middle layers
H = len(η) - 1
#teacher value
t = [None for _ in range(N)]
#function output value
f_out = [[None for _ in range(H + 1)] for _ in range(N)]
#function input value
f_in = [[None for _ in range(H + 1)] for _ in range(N)]
#weight
w = [[None for _ in range(H + 1)] for _ in range(N + 1)]
for h in range(H):
w[0][h] = np.random.uniform(-1.0, 1.0, (layer[h + 1], layer[h]))
w[0][H] = np.zeros((layer[H + 1], layer[H]))
for h in range(H + 1):
print(w[0][h])
#squared error
dE = [None for _ in range(N)]
#∂E/∂IN
δ = [[None for _ in range(H + 1)] for _ in range(N)]
#Learning
for n in range(N):
#input value
t[n] = clip
while np.abs(t[n]) > np.log(np.log(clip)):#Gradient vanishing problem Measure
f_out[n][0] = np.random.uniform(0.0, 10.0, (layer[0]))
f_out[n][0] = np.array(f_out[n][0], dtype=np.complex)
#teacher value
t[n] = f_out[n][0][0] / f_out[n][0][1]
#order propagation
f_in[n][0] = np.dot(w[n][0], f_out[n][0])
f_out[n][1] = np.log(f_in[n][0])
f_in[n][1] = np.dot(w[n][1], f_out[n][1])
#output value
div = np.exp(f_in[n][1])
#squared error
dE[n] = np.real(div - t[n])#value after squared error differentiation due to omission of calculation
dE[n] = np.clip(dE[n], -clip, clip)
dE[n] = np.nan_to_num(dE[n])
#δ
δ[n][1] = np.real(div * dE[n])
δ[n][1] = np.clip(δ[n][1], -clip, clip)
δ[n][1] = np.nan_to_num(δ[n][1])
δ[n][0] = np.real((1.0 / f_in[n][0]) * np.dot(w[n][1].T, δ[n][1]))
δ[n][0] = np.clip(δ[n][0], -clip, clip)
δ[n][0] = np.nan_to_num(δ[n][0])
#back propagation
for h in range(H + 1):
#Gradient vanishing problem Measure
# a*10^b a part only
w10_u = np.real(δ[n][h].reshape(len(δ[n][h]), 1) * f_out[n][h])
w10_u = np.clip(w10_u, -clip, clip)
w10_u = np.nan_to_num(w10_u)
w10_d = np.where(
w10_u != 0.0,
np.modf(np.log10(np.abs(w10_u)))[1],
0.0
)
#Decimal not supported
w10_d = np.clip(w10_d, 0.0, clip)
w[n + 1][h] = w[n][h] - η[h] * (w10_u / np.power(10.0, w10_d))
#Output
#Weight
for h in range(H + 1):
print(w[N][h])
#figure
#area height
py = np.amax(layer)
#area width
px = (H + 1) * 2
#area size
plt.figure(figsize = (16, 9))
#horizontal axis
x = np.arange(0, N + 1, 1) #0 to N+Up to 1 in 1 increments
#drawing
for h in range(H + 1):
for l in range(layer[h + 1]):
#area matrix
plt.subplot(py, px, px * l + h * 2 + 1)
for m in range(layer[h]):
#line
plt.plot(x, np.array([w[n][h][l, m] for n in range(N + 1)]), label = "w[" + str(h) + "][" + str(l) + "," + str(m) + "]")
#grid line
plt.grid(True)
#legend
plt.legend(bbox_to_anchor = (1, 1), loc = 'upper left', borderaxespad = 0, fontsize = 10)
#save
plt.savefig('graph_div.png')
#show
plt.show()
As a countermeasure -Set the input value to a complex number. -Only data that does not easily overflow with the teacher value. ・ Do not set δ to a value larger than a certain value. ・ Set the gradient only to the a part of a * 10 ^ b so that the weight does not diverge. (Only when b is a positive number) As a countermeasure ・ Change the input value to a complex number. ・ Use only data that is difficult to overflow with the teacher value. ・ Do not make δ larger than a certain value. ・ Set the gradient to only a part of a * 10 ^ b so that the weight does not diverge. (Only when b is a positive number)
Target value\\
Target\ value\\
w[0]=
\begin{pmatrix}
△ & □\\
▲ & ■
\end{pmatrix}
,w[1]=
\begin{pmatrix}
○ & ●
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=1,□=0,〇=1 \\
▲=0,■=1,●=-1
\end{array}
\right.\\
\\
initial value\\
Initial value\\
w[0]=
\begin{pmatrix}
-0.12716087 & 0.34977234\\
0.85436489 & 0.65970844
\end{pmatrix}
,w[1]=
\begin{pmatrix}
0.0 & 0.0
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-0.12716087,□=0.34977234,〇=0.0 \\
▲=0.85436489,■=0.65970844,●=0.0
\end{array}
\right.\\
\\
Calculated value\\
Calculated\ value\\
w[0]=
\begin{pmatrix}
-1.71228449e-08 & 1.00525062e+00\\
1.00525061e+00 & -4.72288257e-09
\end{pmatrix}
,w[1]=
\begin{pmatrix}
-0.99999998 & 0.99999998
\end{pmatrix}\\
\left\{
\begin{array}{l}
△=-1.71228449e-08,□=1.00525062e+00,〇=-0.99999998\\
▲=1.00525061e+00,■=-4.72288257e-09,●=0.99999998
\end{array}
\right.\\
\\
Succeeded. The values of △ □ and ▲ ■ are reversed.\\
I don't like it in a way that has the correct answer and is close to it.\\
Even so, log at most trying to teach division,exp,With complex numbers\\
I was in trouble because I had to extend to high school mathematics.\\
\\
Succeeded.\ The\ values\ of\ △□\ and\ ▲■\ are\ reversed.\\
I\ don't\ like\ it\ in\ the\ way\ that\ I\ get\ it\ right.\\
Even\ so,\ at\ the\ very\ least\ trying\ to\ teach\ division\\
log,\ exp,\ complex\ numbers\\
I\ had\ trouble\ expanding\ to\ high\ school\ mathematics.\\
Recommended Posts