The content is ["Deep Learning from scratch"](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82%89%E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3 % 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90 % 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7% E6% AF % 85 / dp / 4873117585) 4.4.2 Gradient for neural network (per p.110). Since the question has been resolved, I will write an article.
What I was wondering about was the code at the bottom of p.111. (following)
>>> def f(W):
... return net.loss(x, t)
...
>>> dW = numerical_gradient(f, net.W)
>>> print(dW)
[[ 0.21924763 0.14356247 -0.36281009]
[ 0.32887144 0.2153437 -0.54421514]]
I have defined the function f and passed it as an argument to the numerical_gradient function defined shortly before this book.
When I changed the second argument of this numerical_gradient function to an appropriate value, the value of dW changed. (following)
#Net as the second argument.Specify W.(net.For W, p in this book.See the commentary from 110.)
>>> dW = numerical_gradient(f, net.W)
>>> print(dW)
[[ 0.06281915 0.46086202 -0.52368118]
[ 0.09422873 0.69129304 -0.78552177]]
#Store the numpy array in a and specify it as the second argument.
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
>>> print(dW)
[[0. 0. 0.]
[0. 0. 0.]]
I didn't understand why the value of dW changed.
This article is the answer to this question.
I will explain why I was wondering why the value of dW changed.
First of all, the return value of this f function has nothing to do with the value of the argument W.
This is because W does not appear after return in the f function.
So no matter what value you change the argument W of the f function, the return value will not change at all. (See below)
#Specify 3 as the argument of the f function.
>>> f(3)
2.0620146712373737
#net to f function.Specify W.
>>> f(net.W)
2.0620146712373737
#Define a numpy array and assign it to a. Compare when a is passed to the f function and when 3 is passed.
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> f(a) == f(3)
True
As another point, I will present the numerical_gradient function. (following)
def numerical_gradient(f, x):
h = 1e-4 # 0.0001
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val #Restore the value
it.iternext()
return grad
This function returns the grad defined in the function.
If you follow from the bottom of the code how this grad is derived,
You can find the code grad [idx] = (fxh1 --fxh2) / (2 * h).
So what are fxh1 and fxh2?
You can find the code fxh1 = f (x) fxh2 = f (x).
From point 2, you can think that the return value grad of the numerical_gradient function is due to the value of f (x).
From point 1, the f function returns a constant value regardless of the value of the argument.
From points 1 and 2, no matter what value you assign to the second argument x of the numerical_gradient function
I thought it was strange that the return value of the numerical_gradient function would change.
First, let's take a closer look at the numerical_gradient function.
And let's take a closer look at the function f.
numerical_gradient functionA little tweak to the numerical_gradient code.
Specifically, under fxh1 = f (x) fxh2 = f (x) respectively
Enter print (fxh1) print (fxh2). (following)
def numerical_gradient(f, x):
h = 1e-4
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx:', idx)
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
print('fxh1:', fxh1) # print(fxh1)Enter
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2:', fxh2) # print(fxh2)Enter
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val
it.iternext()
return grad
Now let's move the code by changing the second argument.
Substitute net.W as the second argument
>>> dW = numerical_gradient(f, net.W)
fxh1: 2.062020953321506
fxh2: 2.0620083894906935
fxh1: 2.062060757760379
fxh2: 2.061968585355599
fxh1: 2.061962303319411
fxh2: 2.062067039554999
fxh1: 2.062024094490122
fxh2: 2.062005248743893
fxh1: 2.062083801262337
fxh2: 2.0619455426551796
fxh1: 2.061936119510309
fxh2: 2.06209322386368
Substitute your own numpy array ʻa` for the second argument
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
If you substitute net.W for the second argument, the values of fxh1 and fxh2 are slightly different.
On the other hand, when you substitute your own numpy array ʻa, fxh1 and fxh2have the same value. Why? From now on, I will explain by considering the case wherenet.W` is put in the second argument.
Let's take a closer look at the numerical_gradient function.
There is the following code in the middle.
This code changes the index number of ʻidx, retrieves the xof that index number, and A smallh is added to the extracted xand assigned to thef` function.
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index #Change the index number of idx
tmp_val = x[idx] #Take out the x of that index number
x[idx] = tmp_val + h #Add a small h to the extracted x
fxh1 = f(x) # f(x+h) #Assigned to the f function.
Did the return value of the f function change due to the addition of a small h to x?
However, the fact that the return value of the f function does not change due to changes in the arguments is shown in point 1 of" Why did you wonder? "
Actually, there is a part that changed by adding a small h to x.
The x here is the net.W assigned to the second argument of the numerical_gradient function.
After adding a small h to net.W, it is passed to the argument of the f function.
The following part of the numerical_gradient function shown earlier.
x[idx] = tmp_val + h #Add a small h to the extracted x
fxh1 = f(x) #Assigned to the f function.
The important thing here is the order in which the f function is called after the net.W has changed.
How does the change in net.W affect the f function?
f function in a little more detailLet's see how the change in net.W affects the f function.
The f function is shown below.
def f(W):
return net.loss(x, t)
The loss function that appears in the f function is defined in the simpleNet class defined on p.110 of this manual.
The simpleNet class is shown below.
import sys, os
sys.path.append(os.pardir)
import numpy as np
from common.functions import softmax, cross_entropy_error
from common.gradient import numerical_gradient
class simpleNet:
def __init__(self):
self.W = np.random.randn(2,3)
def predict(self, x):
return np.dot(x, self.W)
def loss(self, x, t):
z = self.predict(x)
y = softmax(z)
loss = cross_entropy_error(y, t)
return loss
You will see the loss function at the bottom of simpleNet.
Inside the loss function is the predict function.
The predict function is defined just above the loss function.
If you take a closer look at the predict function, you will see the weight parameter W.
At the end of "Learn more about the numerical_gradient function ", what effect does the change in net.W have on the f function? The answer is here.
By changing net.W, the weight parameter W of the predict function called by the loss function in the f function has changed.
Then, of course, the return value of the loss function will change.
The explanation is finally over.
Let's get back to the numerical_gradient function. The numerical_gradient function is shown below again.
def numerical_gradient(f, x):
h = 1e-4
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx:', idx)
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2:', fxh2)
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val
it.iternext()
return grad
As mentioned above, the return value of the loss function in the f function changes due to the change of net.W.
In this code, adding a small h to x ( net.W) changed the function f, and the value of fxh1 changed.
The same applies to the subsequent fxh2.
Then passed to subsequent code, the numerical_gradient function prints the return value.
This solved the first question I asked.
The important point is
● Suppress that the second argument of the numerical_gradient function, x, is net.W.
● The return value of the f function changed as the net.W changed.
Let's continue reading "Deep Learning from scratch"!
Recommended Posts