Summary

When differentiating the eigenvalues of a matrix, it diverges if the eigenvalues are degenerate. Let's deal with it appropriately.

Trigger

https://qiita.com/sage-git/items/1afa0bb8b3a7ee36600d

If the eigenvalues can be differentiated and optimized, I thought that the eigenvectors could be created in the same way, and a matrix that could obtain the desired eigenvectors could be obtained. I actually got it, but there was a small problem, so I dealt with it.

Math

Refer to this forum for derivation. https://mathoverflow.net/questions/229425/derivative-of-eigenvectors-of-a-matrix-with-respect-to-its-components

For a matrix $ A $, consider the eigenvalue $ \ lambda_i $, the corresponding eigenvector $ \ vec {n} _i $, and differentiate it.

Differentiation of eigenvalues

\frac{\partial \lambda_i}{\partial A_{kl}} = \left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_i

Here, $ \ frac {\ partial A} {\ partial A_ {kl}} \ vec {n} \ _i $ looks like a projection operator. Bring the $ k $ th component of $ \ vec {n} \ _i $ to the $ l $ th component, and the rest is a vector of 0. Then, this derivative is the value obtained by multiplying the $ k $ th and the $ l $ th of $ \ vec {n} _i $. It's more straightforward than I expected.

Differentiation of eigenvectors

For the $ i $ th eigenvector $ \ vec {n} \ _i $

\frac{\partial \vec{n}_i}{\partial A_{kl}} = \sum_{j\neq i}\left[\frac{1}{\lambda_i - \lambda_j}\left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_j\right]\vec{n}_j

Can be written. In short, it feels like adding appropriate weights to other eigenvectors. What you should pay attention to here is the term $ \ frac {1} {\ lambda_i-\ lambda_j} $. This causes this derivative to diverge when there are multiple identical eigenvalues (physically corresponding to degeneracy).

verification of accounts

Preparation

Determine the appropriate matrix X and check the eigenvalues and eigenvectors.

import tensorflow as tf

X = tf.Variable(initial_value=[[1.0, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])
eigval, eigvec = tf.linalg.eigh(X)
print(eigval)
print(eigvec)

`eigval (eigenvalue)`


tf.Tensor([0.9218725 1.        1.2154819], shape=(3,), dtype=float32)

`eigvec (eigenvector)`


tf.Tensor(
[[-0.8566836   0.          0.51584214]
 [-0.         -1.          0.        ]
 [ 0.51584214  0.          0.8566836 ]], shape=(3, 3), dtype=float32)

Differentiate eigenvalues

Try to calculate the minimum eigenvalue using GradientTape.

with tf.GradientTape() as g:
    g.watch(X)
    eigval, eigvec = tf.linalg.eigh(X)
    Y = eigval[0]
dYdX = g.gradient(Y, X)
print(dYdX)

`Derivative of eigenvalue 0`


tf.Tensor(
[[ 0.7339068   0.          0.        ]
 [ 0.          0.          0.        ]
 [-0.88382703  0.          0.2660931 ]], shape=(3, 3), dtype=float32)

0.7339068 \approx 0.8566836^2
0.2660931 \approx 0.51584214^2
0.88382703 \approx 2 \times 0.8566836 \times 0.51584214

So it's a reasonable result. It seems that $ 2 \ times $ is because the symmetric matrix uses only the lower half.

Differentiate the eigenvector

ʻEigvec [i, j]` is the $ i $ th component of the eigenvector for the $ j $ th eigenvalue.

with tf.GradientTape() as g:
    g.watch(X)
    eigval, eigvec = tf.linalg.eigh(X)
    Y = eigvec[0, 1]
dYdX = g.gradient(Y, X)
print(dYdX)

`The first component of the eigenvector 1`


tf.Tensor(
[[ 0.        0.        0.      ]
 [-8.158832  0.        0.      ]
 [ 0.        7.707127  0.      ]], shape=(3, 3), dtype=float32)

This is annoying, so I will skip the check.

Up to this point is normal.

Degenerate

If you change the value of X as follows, the two eigenvalues will be 1.

X = tf.Variable(initial_value=[[1.1225665, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])

I used the code from Last article to find it.

`eigval`


tf.Tensor([1.        1.        1.2599211], shape=(3,), dtype=float32)

`eigvec`


tf.Tensor(
[[-0.7269436  0.         0.6866972]
 [-0.        -1.         0.       ]
 [ 0.6866972  0.         0.7269436]], shape=(3, 3), dtype=float32)

Differentiating this, both

`dYdX`


tf.Tensor(
[[nan  0.  0.]
 [nan nan  0.]
 [nan nan nan]], shape=(3, 3), dtype=float32)

have become. I can't understand that the eigenvalue is nan, but the derivative of the eigenvector is nan according to the theoretical formula.

Note that the eigenvector to be differentiated is the third, that is, the derivative of the eigenvector at $ i $ where $ j $ does not exist such that $ \ lambda_i-\ lambda_j = 0 $ is also nan. In other words, in the Tensorflow implementation, it seems that all derivatives are mercilessly nan in a matrix with degeneracy.

Countermeasures

There are several possible workarounds.

Transform the matrix within the equivalence range in the problem
Move the value with the expectation that the value will change slightly
Rewrite the algorithm in the direction that does not use the differentiation of eigenvalues and eigenvectors in the first place

Here, I will give a perturbation at random. Because this derivative is used for the gradient method, it can be perturbed like a kind of annealing without affecting the final result. Of course, if the final result is degenerate eigenvalues, you have to think specially.

Find degeneracy

In any case, consider finding out if there is the same eigenvalue before differentiating.

eigval[1:] - eigval[:-1]

This will give you an array that contains the differences between the eigenvalues next to each other. Taking advantage of the fact that the eigenvalues returned by tf.linalg.eigh are already sorted in ascending order, we can see that they are over $ 0 $ without using absolute values. And if even one of them has almost 0 components, it is degenerate.

tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20)

After that, with this as a condition, loop until it is not satisfied. ʻAis a matrix ofN rows N` columns.

eigval, eigvec = tf.linalg.eigh(A)

while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
    Ap = A + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
    eigval, eigvec = tf.linalg.eigh(Ap)

I think that the criterion 1e-20 and the perturbation magnitude 0.001 will change depending on the problem. For the time being, this solved what I wanted to do now.

bonus

Let's calculate a one-dimensional quantum well. The physical explanation is

https://qiita.com/sage-git/items/e5ced4c0f555e2b4d77b

Etc., give it to another page.

Considering the potential $ U (x) $ such that the range of $ x \ in \ left [-\ pi, \ pi \ right] $ is a finite potential, otherwise it is $ + \ infinty $. If you set $ U (x) $, the wave function in the ground state will be

\psi(x) = A\exp\left(-2x^2\right)

Numerically find out if The method is to write the Hamiltonian $ H $ in a matrix, find this eigenvalue / eigenvector, and approach it by the gradient method so that the eigenvector for the smallest eigenvalue is this $ \ psi (x) $. However, in numerical calculation, the function cannot be treated as a function, so the range $ \ left [-\ pi, \ pi \ right] $ is divided by $ N $ points. If the coordinates of the $ i $ th point are $ x_i $, the wave function is the vector $ \ vec {\ psi} of the $ i $ th component of $ \ vec {\ psi} $ = \ psi (x_i) $. You can write it with $.

One thing to keep in mind is that there is a degree of freedom in the sign of the eigenvector, so the loss function is

L_+ = \sum_i^N(n_i - \psi(x_i))^2

L_- = \sum_i^N(n_i + \psi(x_i))^2

Both can be considered. It is common for the iteration to flip as you turn it. This time the smallest of these

L = \min(L_+, L_-)

Then it worked.

Based on these, I wrote the following code.

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

def main():
    max_x = np.pi
    N = 150 + 1
    dx = max_x*2/N
    x = np.arange(-max_x, max_x, dx) 
    D = np.eye(N, k=1)
    D += -1 * np.eye(N)
    D += D.T
    D = D/(dx**2)
    
    m = 1.0
    D_tf = tf.constant(D/(2.0*m), dtype=tf.float32)

    V0_np = np.exp( - x**2/0.5)
    V0_np = V0_np/np.linalg.norm(V0_np)
    V0_target = tf.constant(V0_np, dtype=tf.float32)

    U0 = np.zeros(shape=[N])
    U = tf.Variable(initial_value=U0, dtype=tf.float32)
    
    def calc_V(n):
        H = - D_tf + tf.linalg.diag(U)
        eigval, eigvec = tf.linalg.eigh(H)

        while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
            H = - D_tf + tf.linalg.diag(U) + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
            eigval, eigvec = tf.linalg.eigh(H)
            print("found lambda_i+1 - lambda_i = 0")
        v_raw = eigvec[:, n]
        return v_raw

    def calc_loss():
        v0 = calc_V(0)
        dplus = tf.reduce_sum((v0 - V0_target)**2)
        dminus = tf.reduce_sum((v0 + V0_target)**2)
        return tf.minimum(dplus, dminus)

    opt = tf.keras.optimizers.Adam(learning_rate=0.001)
    L = calc_loss()
    v0_current = calc_V(0)

    print(L)

    while L > 1e-11:
        opt.minimize(calc_loss, var_list=[U])
        v0_current = calc_V(0)
        L = (tf.abs(tf.reduce_sum(v0_current*V0_target)) - 1)**2
        print(L)

    plt.plot(x, U.numpy())
    plt.show()
        
if __name__ == "__main__":
    main()

After doing this and leaving it on a machine loaded with Ryzen 5 and GTX 1060 for tens of minutes, the graph shown below was obtained.

In other words, it was found that if such a potential is set, the ground state wavefunction in the quantum well will become a Gaussian wave packet.

Unfortunately, this is a numerical solution and cannot be confirmed analytically. I want to fit and solve it with some good function. I don't know if humans can do it.

Be careful when differentiating the eigenvectors of a matrix