There is a book called "Deep Learning from scratch The theory and implementation of deep learning learned with Python". I read it twice, but I think I understand it, I don't understand it. Since it is implemented in Python in the first place, as a Java developer, I feel that it has been fooled. Because of the dynamic typing, the arguments of the same method are sometimes numbers and sometimes arrays, depending on what the caller passes ... too tricky ... ~~ What you should learn about Deeplearning4j obediently ~~ "Yes, let's implement it in Java". Please refer to the book for the explanation because it is only implemented.
Is it possible to implement differentiation and gradient in Java in the first place (P97 4.3 Numerical differentiation / P03 4.4 Gradient)? If I can't, it's unlikely that I'll be able to do anything, so I experimented for the time being (Java 8 or later).
ArrayUtil.java
private static double h = 1e-4; //Very small number
public double numericalDiff(DoubleUnaryOperator func, double x){
return (func.applyAsDouble(x + h) - func.applyAsDouble(x-h))/ (2*h);
}
The content of the test is P103. It's just like the book, so it's considered good.
ArrayUtilTest.java
@Test
public void numericalDiff1(){
assertThat(target.numericalDiff(p-> p*p+4*4, 3.0), is(6.00000000000378));
assertThat(target.numericalDiff(p-> 3*3+p*p, 4.0), is(7.999999999999119));
}
Implemented P104 of the book. ~~ In the implementation of the book (Python), the original value is assigned to tmp_val, and after calculation, it is returned to the original value. However, if you do it in Java, the original data will change after all because the reference destination is the same. Therefore, a deep copy is used to hold the original data. ~~ → I received a comment that there is no problem if I calculate immediately after substitution. It's reasonable.
ArrayUtil.java
private static double h = 1e-4; //Very small number
public double[][] numericalGradient(ToDoubleFunction<double[][]> func, double[][] x){
int cntRow = x.length;
int cntCol = x[0].length;
double[][] result = new double[cntRow][cntCol];
for (int i=0; i < cntRow; i++){
for (int j=0; j < cntCol; j++){
double[][] xPlus = deepCopy(x);
xPlus[i][j] = xPlus[i][j] + h;
double[][] xMinus = deepCopy(x);
xMinus[i][j] = xMinus[i][j] - h;
result[i][j] = (func.applyAsDouble(xPlus) - func.applyAsDouble(xMinus))/ (2*h);
}
}
return result;
}
public double[][] deepCopy(double[][] x){
double[][] copy = new double[x.length][];
for (int i = 0; i < copy.length; i++){
copy[i] = new double[x[i].length];
System.arraycopy(x[i], 0, copy[i], 0, x[i].length);
}
return copy;
}
The content of the test is P104. Similarly, because it is as per the book, it is considered to be good.
ArrayUtilTest.java
@Test
public void numericalGradient(){
ToDoubleFunction<double[][]> function = p-> p[0][0] * p[0][0] + p[0][1]*p[0][1];
double[][] x = {{3,4}};
double[][] result = target.numericalGradient(function, x);
assertThat(result[0][0], is(6.00000000000378));
assertThat(result[0][1], is(7.999999999999119));
result = target.numericalGradient(function, new double[][]{{0,2}});
assertThat(result[0][0], is(closeTo(0.0, 0.000001)));
assertThat(result[0][1], is(closeTo(4.0, 0.000001)));
}
Differentiation and partial differentiation seem to be okay. By the way, I implemented all of them. The problem is that the PC is slow and I can not verify whether it is finally outputting proper results orz
Recommended Posts