Be careful when differentiating the eigenvectors of a matrix

Summary

When differentiating the eigenvalues of a matrix, it diverges if the eigenvalues are degenerate. Let's deal with it appropriately.

Trigger

Previous article

https://qiita.com/sage-git/items/1afa0bb8b3a7ee36600d

If the eigenvalues can be differentiated and optimized, I thought that the eigenvectors could be created in the same way, and a matrix that could obtain the desired eigenvectors could be obtained. I actually got it, but there was a small problem, so I dealt with it.

Math

Refer to this forum for derivation. https://mathoverflow.net/questions/229425/derivative-of-eigenvectors-of-a-matrix-with-respect-to-its-components

For a matrix $ A $, consider the eigenvalue $ \ lambda_i $, the corresponding eigenvector $ \ vec {n} _i $, and differentiate it.

Differentiation of eigenvalues

\frac{\partial \lambda_i}{\partial A_{kl}} = \left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_i

Here, $ \ frac {\ partial A} {\ partial A_ {kl}} \ vec {n} \ _i $ looks like a projection operator. Bring the $ k $ th component of $ \ vec {n} \ _i $ to the $ l $ th component, and the rest is a vector of 0. Then, this derivative is the value obtained by multiplying the $ k $ th and the $ l $ th of $ \ vec {n} _i $. It's more straightforward than I expected.

Differentiation of eigenvectors

For the $ i $ th eigenvector $ \ vec {n} \ _i $

\frac{\partial \vec{n}_i}{\partial A_{kl}} = \sum_{j\neq i}\left[\frac{1}{\lambda_i - \lambda_j}\left(\frac{\partial A}{\partial A_{kl}}\vec{n}_i\right)\cdot\vec{n}_j\right]\vec{n}_j

Can be written. In short, it feels like adding appropriate weights to other eigenvectors. What you should pay attention to here is the term $ \ frac {1} {\ lambda_i-\ lambda_j} $. This causes this derivative to diverge when there are multiple identical eigenvalues (physically corresponding to degeneracy).

verification of accounts

Preparation

Determine the appropriate matrix X and check the eigenvalues and eigenvectors.

import tensorflow as tf

X = tf.Variable(initial_value=[[1.0, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])
eigval, eigvec = tf.linalg.eigh(X)
print(eigval)
print(eigvec)

eigval (eigenvalue)


tf.Tensor([0.9218725 1.        1.2154819], shape=(3,), dtype=float32)

eigvec (eigenvector)


tf.Tensor(
[[-0.8566836   0.          0.51584214]
 [-0.         -1.          0.        ]
 [ 0.51584214  0.          0.8566836 ]], shape=(3, 3), dtype=float32)

Differentiate eigenvalues

Try to calculate the minimum eigenvalue using GradientTape.

with tf.GradientTape() as g:
    g.watch(X)
    eigval, eigvec = tf.linalg.eigh(X)
    Y = eigval[0]
dYdX = g.gradient(Y, X)
print(dYdX)

Derivative of eigenvalue 0


tf.Tensor(
[[ 0.7339068   0.          0.        ]
 [ 0.          0.          0.        ]
 [-0.88382703  0.          0.2660931 ]], shape=(3, 3), dtype=float32)

So it's a reasonable result. It seems that $ 2 \ times $ is because the symmetric matrix uses only the lower half.

Differentiate the eigenvector

ʻEigvec [i, j]` is the $ i $ th component of the eigenvector for the $ j $ th eigenvalue.

with tf.GradientTape() as g:
    g.watch(X)
    eigval, eigvec = tf.linalg.eigh(X)
    Y = eigvec[0, 1]
dYdX = g.gradient(Y, X)
print(dYdX)

The first component of the eigenvector 1


tf.Tensor(
[[ 0.        0.        0.      ]
 [-8.158832  0.        0.      ]
 [ 0.        7.707127  0.      ]], shape=(3, 3), dtype=float32)

This is annoying, so I will skip the check.

Up to this point is normal.

Degenerate

If you change the value of X as follows, the two eigenvalues will be 1.

X = tf.Variable(initial_value=[[1.1225665, 0.0, 0.12975], [0.0, 1.0, 0.0], [0.12975, 0.0, 1.1373545]])

I used the code from Last article to find it.

eigval


tf.Tensor([1.        1.        1.2599211], shape=(3,), dtype=float32)

eigvec


tf.Tensor(
[[-0.7269436  0.         0.6866972]
 [-0.        -1.         0.       ]
 [ 0.6866972  0.         0.7269436]], shape=(3, 3), dtype=float32)

Differentiating this, both

dYdX


tf.Tensor(
[[nan  0.  0.]
 [nan nan  0.]
 [nan nan nan]], shape=(3, 3), dtype=float32)

have become. I can't understand that the eigenvalue is nan, but the derivative of the eigenvector is nan according to the theoretical formula.

Note that the eigenvector to be differentiated is the third, that is, the derivative of the eigenvector at $ i $ where $ j $ does not exist such that $ \ lambda_i-\ lambda_j = 0 $ is also nan. In other words, in the Tensorflow implementation, it seems that all derivatives are mercilessly nan in a matrix with degeneracy.

Countermeasures

There are several possible workarounds.

Here, I will give a perturbation at random. Because this derivative is used for the gradient method, it can be perturbed like a kind of annealing without affecting the final result. Of course, if the final result is degenerate eigenvalues, you have to think specially.

Find degeneracy

In any case, consider finding out if there is the same eigenvalue before differentiating.

eigval[1:] - eigval[:-1]

This will give you an array that contains the differences between the eigenvalues next to each other. Taking advantage of the fact that the eigenvalues returned by tf.linalg.eigh are already sorted in ascending order, we can see that they are over $ 0 $ without using absolute values. And if even one of them has almost 0 components, it is degenerate.

tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20)

After that, with this as a condition, loop until it is not satisfied. ʻAis a matrix ofN rows N` columns.

eigval, eigvec = tf.linalg.eigh(A)

while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
    Ap = A + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
    eigval, eigvec = tf.linalg.eigh(Ap)

I think that the criterion 1e-20 and the perturbation magnitude 0.001 will change depending on the problem. For the time being, this solved what I wanted to do now.

bonus

Let's calculate a one-dimensional quantum well. The physical explanation is

https://qiita.com/sage-git/items/e5ced4c0f555e2b4d77b

Etc., give it to another page.

Considering the potential $ U (x) $ such that the range of $ x \ in \ left [-\ pi, \ pi \ right] $ is a finite potential, otherwise it is $ + \ infinty $. If you set $ U (x) $, the wave function in the ground state will be

\psi(x) = A\exp\left(-2x^2\right)

Numerically find out if The method is to write the Hamiltonian $ H $ in a matrix, find this eigenvalue / eigenvector, and approach it by the gradient method so that the eigenvector for the smallest eigenvalue is this $ \ psi (x) $. However, in numerical calculation, the function cannot be treated as a function, so the range $ \ left [-\ pi, \ pi \ right] $ is divided by $ N $ points. If the coordinates of the $ i $ th point are $ x_i $, the wave function is the vector $ \ vec {\ psi} of the $ i $ th component of $ \ vec {\ psi} $ = \ psi (x_i) $. You can write it with $.

One thing to keep in mind is that there is a degree of freedom in the sign of the eigenvector, so the loss function is

L_+ = \sum_i^N(n_i - \psi(x_i))^2
L_- = \sum_i^N(n_i + \psi(x_i))^2

Both can be considered. It is common for the iteration to flip as you turn it. This time the smallest of these

L = \min(L_+, L_-)

Then it worked.

Based on these, I wrote the following code.

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

def main():
    max_x = np.pi
    N = 150 + 1
    dx = max_x*2/N
    x = np.arange(-max_x, max_x, dx) 
    D = np.eye(N, k=1)
    D += -1 * np.eye(N)
    D += D.T
    D = D/(dx**2)
    
    m = 1.0
    D_tf = tf.constant(D/(2.0*m), dtype=tf.float32)

    V0_np = np.exp( - x**2/0.5)
    V0_np = V0_np/np.linalg.norm(V0_np)
    V0_target = tf.constant(V0_np, dtype=tf.float32)

    U0 = np.zeros(shape=[N])
    U = tf.Variable(initial_value=U0, dtype=tf.float32)
    
    def calc_V(n):
        H = - D_tf + tf.linalg.diag(U)
        eigval, eigvec = tf.linalg.eigh(H)

        while tf.math.reduce_any((eigval[1:] - eigval[:-1]) < 1e-20):
            H = - D_tf + tf.linalg.diag(U) + tf.linalg.diag(tf.random.uniform(shape=[N])*0.001)
            eigval, eigvec = tf.linalg.eigh(H)
            print("found lambda_i+1 - lambda_i = 0")
        v_raw = eigvec[:, n]
        return v_raw

    def calc_loss():
        v0 = calc_V(0)
        dplus = tf.reduce_sum((v0 - V0_target)**2)
        dminus = tf.reduce_sum((v0 + V0_target)**2)
        return tf.minimum(dplus, dminus)

    opt = tf.keras.optimizers.Adam(learning_rate=0.001)
    L = calc_loss()
    v0_current = calc_V(0)

    print(L)

    while L > 1e-11:
        opt.minimize(calc_loss, var_list=[U])
        v0_current = calc_V(0)
        L = (tf.abs(tf.reduce_sum(v0_current*V0_target)) - 1)**2
        print(L)

    plt.plot(x, U.numpy())
    plt.show()
        
if __name__ == "__main__":
    main()

After doing this and leaving it on a machine loaded with Ryzen 5 and GTX 1060 for tens of minutes, the graph shown below was obtained.

image.png

In other words, it was found that if such a potential is set, the ground state wavefunction in the quantum well will become a Gaussian wave packet.

Unfortunately, this is a numerical solution and cannot be confirmed analytically. I want to fit and solve it with some good function. I don't know if humans can do it.

Recommended Posts

Be careful when differentiating the eigenvectors of a matrix
[Python memo] Be careful when creating a two-dimensional array (list of lists)
Be careful of the type when making an image mask with Numpy
[Caution] When creating a binary image (1bit / pixel), be aware of the file format!
Be careful when assigning Series as a column to pandas.DataFrame
Find the eigenvalues of a real symmetric matrix in Python
When incrementing the value of a key that does not exist
Be careful when specifying the default argument value in Python3 series
Be careful of LANG for UnicodeEncodeError when printing Japanese with Python 3
[Python] Be careful when using print
Find the rank of a matrix in the XOR world (rank of a matrix on F2)
Addictive note: max (max (list)) must not be used when maxing the value of a 2D array
When creating a matrix in a list
The story of writing a program
Find the intersection of a circle and a straight line (sympy matrix)
I wanted to be careful about the behavior of Python's default arguments
Things to be aware of when building a recommender system using Item2Vec
Semi-automatically generate a description of the package to be registered on PyPI
Be careful when retrieving tweets at regular intervals with the Twitter API
Measure the relevance strength of a crosstab
A quick overview of the Linux kernel
Predict when the ISS will be visible
[python] [meta] Is the type of python a type?
Be careful when adding an array to an array
A memo explaining the axis specification of axis
Get the filename of a directory (glob)
The story of blackjack A processing (python)
Notice the completion of a time-consuming command
Be careful when running CakePHP3 with PHP7.2
A memorandum of trouble when formatting data
Tensorflow, it seems that even the eigenvalues of the matrix can be automatically differentiated
[C language] Be careful of the combination of buffering API and non-buffering system call
The story of Django creating a library that might be a little more useful
[Numpy, scipy] How to calculate the square root of a semi-fixed definite matrix
When increasing the available memory of node, it may be limited by vm.max_map_count
Half a year when the humanities of Gorigori learned AI almost by themselves
How to calculate the volatility of a brand
Get the caller of a function in Python
Visualize the inner layer of a neural network
Why the activation function must be a non-linear function
Make a copy of the list in Python
Find the number of days in a month
Try singular value decomposition of the daimyo matrix
A note about the python version of python virtualenv
The story of making a lie news generator
Calculate the probability of outliers on a boxplot
[Python] A rough understanding of the logging module
Output in the form of a python array
The story of making a mel icon generator
[Python] Find the transposed matrix in a comprehension
A discussion of the strengths and weaknesses of Python
Be careful when working with gzip-compressed text files
When a file is placed in the shared folder of Raspberry Pi, the process is executed.
When reading a csv file with read_csv of pandas, the first column becomes index
Measures to be taken when garbled characters when trying to redirect / pipe the result of aws-cli
A memo of misunderstanding when trying to load the entire self-made module with Python3
I want to be notified of the connection environment when the Raspberry Pi connects to the network