When differentiating the eigenvalues of a matrix, it diverges if the eigenvalues are degenerate. Let's deal with it appropriately.
Previous article
https://qiita.com/sage-git/items/1afa0bb8b3a7ee36600d
If the eigenvalues can be differentiated and optimized, I thought that the eigenvectors could be created in the same way, and a matrix that could obtain the desired eigenvectors could be obtained. I actually got it, but there was a small problem, so I dealt with it.
Refer to this forum for derivation. https://mathoverflow.net/questions/229425/derivative-of-eigenvectors-of-a-matrix-with-respect-to-its-components
For a matrix $ A $, consider the eigenvalue $ \ lambda_i $, the corresponding eigenvector $ \ vec {n} _i $, and differentiate it.
\frac{\partial \lambda_i}{\partial A_{kl}} = \left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_i
Here, $ \ frac {\ partial A} {\ partial A_ {kl}} \ vec {n} \ _i $ looks like a projection operator. Bring the $ k $ th component of $ \ vec {n} \ _i $ to the $ l $ th component, and the rest is a vector of 0. Then, this derivative is the value obtained by multiplying the $ k $ th and the $ l $ th of $ \ vec {n} _i $. It's more straightforward than I expected.
For the $ i $ th eigenvector $ \ vec {n} \ _i $
\frac{\partial \vec{n}_i}{\partial A_{kl}} = \sum_{j\neq i}\left[\frac{1}{\lambda_i - \lambda_j}\left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_j\right]\vec{n}_j
Can be written. In short, it feels like adding appropriate weights to other eigenvectors. What you should pay attention to here is the term $ \ frac {1} {\ lambda_i-\ lambda_j} $. This causes this derivative to diverge when there are multiple identical eigenvalues (physically corresponding to degeneracy).
Determine the appropriate matrix X
and check the eigenvalues and eigenvectors.
import tensorflow as tf
X = tf.Variable(initial_value=[[1.0, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])
eigval, eigvec = tf.linalg.eigh(X)
print(eigval)
print(eigvec)
eigval (eigenvalue)
tf.Tensor([0.9218725 1. 1.2154819], shape=(3,), dtype=float32)
eigvec (eigenvector)
tf.Tensor(
[[-0.8566836 0. 0.51584214]
[-0. -1. 0. ]
[ 0.51584214 0. 0.8566836 ]], shape=(3, 3), dtype=float32)
Try to calculate the minimum eigenvalue using GradientTape
.
with tf.GradientTape() as g:
g.watch(X)
eigval, eigvec = tf.linalg.eigh(X)
Y = eigval[0]
dYdX = g.gradient(Y, X)
print(dYdX)
Derivative of eigenvalue 0
tf.Tensor(
[[ 0.7339068 0. 0. ]
[ 0. 0. 0. ]
[-0.88382703 0. 0.2660931 ]], shape=(3, 3), dtype=float32)
So it's a reasonable result. It seems that $ 2 \ times $ is because the symmetric matrix uses only the lower half.
ʻEigvec [i, j]` is the $ i $ th component of the eigenvector for the $ j $ th eigenvalue.
with tf.GradientTape() as g:
g.watch(X)
eigval, eigvec = tf.linalg.eigh(X)
Y = eigvec[0, 1]
dYdX = g.gradient(Y, X)
print(dYdX)
The first component of the eigenvector 1
tf.Tensor(
[[ 0. 0. 0. ]
[-8.158832 0. 0. ]
[ 0. 7.707127 0. ]], shape=(3, 3), dtype=float32)
This is annoying, so I will skip the check.
Up to this point is normal.
If you change the value of X
as follows, the two eigenvalues will be 1
.
X = tf.Variable(initial_value=[[1.1225665, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])
I used the code from Last article to find it.
eigval
tf.Tensor([1. 1. 1.2599211], shape=(3,), dtype=float32)
eigvec
tf.Tensor(
[[-0.7269436 0. 0.6866972]
[-0. -1. 0. ]
[ 0.6866972 0. 0.7269436]], shape=(3, 3), dtype=float32)
Differentiating this, both
dYdX
tf.Tensor(
[[nan 0. 0.]
[nan nan 0.]
[nan nan nan]], shape=(3, 3), dtype=float32)
have become.
I can't understand that the eigenvalue is nan
, but the derivative of the eigenvector is nan
according to the theoretical formula.
Note that the eigenvector to be differentiated is the third, that is, the derivative of the eigenvector at $ i $ where $ j $ does not exist such that $ \ lambda_i-\ lambda_j = 0 $ is also nan
. In other words, in the Tensorflow implementation, it seems that all derivatives are mercilessly nan
in a matrix with degeneracy.
There are several possible workarounds.
Here, I will give a perturbation at random. Because this derivative is used for the gradient method, it can be perturbed like a kind of annealing without affecting the final result. Of course, if the final result is degenerate eigenvalues, you have to think specially.
In any case, consider finding out if there is the same eigenvalue before differentiating.
eigval[1:] - eigval[:-1]
This will give you an array that contains the differences between the eigenvalues next to each other. Taking advantage of the fact that the eigenvalues returned by tf.linalg.eigh
are already sorted in ascending order, we can see that they are over $ 0 $ without using absolute values. And if even one of them has almost 0 components, it is degenerate.
tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20)
After that, with this as a condition, loop until it is not satisfied. ʻAis a matrix of
N rows
N` columns.
eigval, eigvec = tf.linalg.eigh(A)
while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
Ap = A + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
eigval, eigvec = tf.linalg.eigh(Ap)
I think that the criterion 1e-20
and the perturbation magnitude 0.001
will change depending on the problem.
For the time being, this solved what I wanted to do now.
Let's calculate a one-dimensional quantum well. The physical explanation is
https://qiita.com/sage-git/items/e5ced4c0f555e2b4d77b
Etc., give it to another page.
Considering the potential $ U (x) $ such that the range of $ x \ in \ left [-\ pi, \ pi \ right] $ is a finite potential, otherwise it is $ + \ infinty $. If you set $ U (x) $, the wave function in the ground state will be
\psi(x) = A\exp\left(-2x^2\right)
Numerically find out if The method is to write the Hamiltonian $ H $ in a matrix, find this eigenvalue / eigenvector, and approach it by the gradient method so that the eigenvector for the smallest eigenvalue is this $ \ psi (x) $. However, in numerical calculation, the function cannot be treated as a function, so the range $ \ left [-\ pi, \ pi \ right] $ is divided by $ N $ points. If the coordinates of the $ i $ th point are $ x_i $, the wave function is the vector $ \ vec {\ psi} of the $ i $ th component of $ \ vec {\ psi} $ = \ psi (x_i) $. You can write it with $.
One thing to keep in mind is that there is a degree of freedom in the sign of the eigenvector, so the loss function is
L_+ = \sum_i^N(n_i - \psi(x_i))^2
L_- = \sum_i^N(n_i + \psi(x_i))^2
Both can be considered. It is common for the iteration to flip as you turn it. This time the smallest of these
L = \min(L_+, L_-)
Then it worked.
Based on these, I wrote the following code.
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
def main():
max_x = np.pi
N = 150 + 1
dx = max_x*2/N
x = np.arange(-max_x, max_x, dx)
D = np.eye(N, k=1)
D += -1 * np.eye(N)
D += D.T
D = D/(dx**2)
m = 1.0
D_tf = tf.constant(D/(2.0*m), dtype=tf.float32)
V0_np = np.exp( - x**2/0.5)
V0_np = V0_np/np.linalg.norm(V0_np)
V0_target = tf.constant(V0_np, dtype=tf.float32)
U0 = np.zeros(shape=[N])
U = tf.Variable(initial_value=U0, dtype=tf.float32)
def calc_V(n):
H = - D_tf + tf.linalg.diag(U)
eigval, eigvec = tf.linalg.eigh(H)
while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
H = - D_tf + tf.linalg.diag(U) + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
eigval, eigvec = tf.linalg.eigh(H)
print("found lambda_i+1 - lambda_i = 0")
v_raw = eigvec[:, n]
return v_raw
def calc_loss():
v0 = calc_V(0)
dplus = tf.reduce_sum((v0 - V0_target)**2)
dminus = tf.reduce_sum((v0 + V0_target)**2)
return tf.minimum(dplus, dminus)
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
L = calc_loss()
v0_current = calc_V(0)
print(L)
while L > 1e-11:
opt.minimize(calc_loss, var_list=[U])
v0_current = calc_V(0)
L = (tf.abs(tf.reduce_sum(v0_current*V0_target)) - 1)**2
print(L)
plt.plot(x, U.numpy())
plt.show()
if __name__ == "__main__":
main()
After doing this and leaving it on a machine loaded with Ryzen 5 and GTX 1060 for tens of minutes, the graph shown below was obtained.
In other words, it was found that if such a potential is set, the ground state wavefunction in the quantum well will become a Gaussian wave packet.
Unfortunately, this is a numerical solution and cannot be confirmed analytically. I want to fit and solve it with some good function. I don't know if humans can do it.
Recommended Posts