Linear algebra has an operation called singular value decomposition. It is a process that decomposes a matrix into singular values and a matrix that creates them. This decomposition is a reversible process, but the original matrix can be approximated by taking only the large singular value and ignoring the small singular value. In recent years, information compression utilizing this property has been actively used in various places. Singular value decomposition plays a central role in approximating quantum states with a tensor network, which is familiar to me.
The purpose of this paper is to try out what kind of processing singular value decomposition is by actually using a simple matrix, and to realize that "I see, it's information compression."
The code is below.
https://github.com/kaityo256/daimyo_svd
If you want to open Jupyter Notebook in Google Colab, click here (https://colab.research.google.com/github/kaityo256/daimyo_svd/blob/main/daimyo_svd.ipynb).
First of all, let's confirm that information can be compressed by singular value decomposition. Below, it is assumed to be executed in Google Colab. When executing in other environment, please read as appropriate, such as how to specify the path to the font.
First, let's import what you need.
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from scipy import linalg
Next, install the Japanese font.
!apt-get -y install fonts-ipafont-gothic
After successful installation, let's create a font object of ʻImageFont`.
fpath='/usr/share/fonts/opentype/ipafont-gothic/ipagp.ttf'
fontsize = 50
font = ImageFont.truetype(fpath, fontsize)
Write "Daimyo Procession" in black on a white background.
LX = 200
LY = fontsize
img = Image.new('L', (LX,LY),color=255)
draw = ImageDraw.Draw(img)
draw.text((0,0), "Daimyo procession", fill=0, font=font)
img
It is a success that the "Daimyo procession" is displayed like this.
Now, let's receive the matrix in the form of a NumPy array from the drawn image. Simply eat ʻimg.getdata ()on
np.array, but that would result in a one-dimensional array, so use
reshape` to make a matrix.
data = np.array(img.getdata()).reshape((LY,LX))
This data
is the original matrix. Matrix elements have values from 0 to 255 and represent the brightness of a pixel.
Conversely, let's display the matrix as an image.
Image.fromarray(np.uint8(data))
If the following display is displayed, it is successful.
Here, let's also check the rank of this matrix. You can check it with np.linalg.matrix_rank
.
np.linalg.matrix_rank(data) # => 47
The maximum rank of the matrix is min (row, column). Since data
is a 50-by-200 matrix, the maximum rank is 50, but the rank drops a little due to the margins above and below the" daimyo matrix ", and it is 47. Let's approximate this with a lower rank matrix, which is compression by singular value decomposition.
Now let's decompose the matrix data
into singular values. It is one shot with scipy.linalg.svd
.
u, s, v = linalg.svd(data)
Let's check each shape as well.
print(f"u: {u.shape}")
print(f"s: {s.shape}")
print(f"v: {v.shape}")
The result looks like this.
u: (50, 50)
s: (50,)
v: (200, 200)
ʻU is a 50x50 square matrix and
vis a 200x200 square matrix. Note that
s is mathematically a 50-by-200 diagonal matrix, but
scipy.linalg.svdreturns a one-dimensional array because it has only diagonal components anyway. This
s is a singular value, all non-negative real numbers, and
scipy.linalg.svd sorts them in descending order. ʻU
and v
are also arranged in the order corresponding to the singular value.
Now that we have a singular value decomposition, let's do a low-rank approximation of the original matrix. We will leave only r singular values from the largest. Correspondingly, the number of column vectors taken from the left of ʻu is ʻur
,
Let vr
be a matrix that takes r
row vectors from the top of v
. It is a matrix with 50 rows and r columns and r rows and 200 columns, respectively. For singular values, use a diagonal matrix of r rows and r columns. Since it is a non-negative real number, it can take the square root. Let this be sr
, ʻur @ sr be ʻA
, and sr @ vr
is B
.
The following code implements the above operation as it is. Here, r = 10
is set.
r = 10
ur = u[:, :r]
sr = np.diag(np.sqrt(s[:r]))
vr = v[:r, :]
A = ur @ sr
B = sr @ vr
Since A is a matrix with 50 rows and r columns and B is a matrix with r rows and 200 columns, the product is the same as the original matrix data
, which is 50 rows and 200 columns.
print(f"A: {A.shape}")
print(f"B: {B.shape}")
print(f"AB: {(A @ B).shape}")
Execution result ↓
A: (50, 10)
B: (10, 200)
AB: (50, 200)
However, while the rank of data
was 47, ʻA @ B` leaves only 10 singular values, so it is a matrix with rank 10.
np.linalg.matrix_rank(A @ B) # => 10
Certainly the rank is 10. This is why it is called low-rank approximation.
Now, the matrix data
has been approximated by two matrices, ʻA and
B. Let's see how much the data was compressed. The number of matrix elements can be obtained with
size`.
print((A.size + B.size)/data.size) # => 0.25
If you leave 10 singular values, you know that the data is compressed to 25%.
Now, let's take a look at how much information was lost from the low-rank approximation. Let's restore the data approximated by ʻA @ Bto an image. However, the pixel data was originally a value from 0 to 255, but it is pushed out from 0 to 255 by
numpy.clip` because it extends beyond the approximation and the image becomes strange.
b = np.asarray(A @ B)
b = np.clip(b, 0, 255)
Image.fromarray(np.uint8(b))
It was restored like this.
What if a lower rank approximation is made? Let's set r = 3
.
r = 3
ur = u[:, :r]
sr = np.diag(np.sqrt(s[:r]))
vr = v[:r, :]
A = ur @ sr
B = sr @ vr
b = np.asarray(A @ B)
b = np.clip(b, 0, 255)
Image.fromarray(np.uint8(b))
The result looks like this.
I'm not good at diagonal lines.
In order to confirm how singular value decomposition is used for information compression, we think that the image written as "Daimyo matrix" is a matrix, perform singular value decomposition, create a low-rank approximate matrix, and information compression rate. I checked and tried to restore the data. I hope you'll try this code and think, "I see, it's a low-rank approximation."
I would like to supplement the mathematical aspects of the above operations.
Consider the following decomposition of the square matrix $ A $.
However, $ P $ is an invertible matrix and $ D $ is a diagonal matrix. When such decomposition is possible, $ A $ is said to be diagonalizable, $ D $ is the diagonal element with the eigenvalues of $ A $ arranged, and $ P $ is the eigenvector of $ A $ as the column vector. It will be arranged side by side.
I also wrote that the eigenvalues and eigenvectors of a matrix are important in Why Learn Linear Algebra. The nature of a matrix is determined by its eigenvalues and eigenvectors. And the eigenvectors are responsible for the properties of the original matrix in descending order of the absolute value of the eigenvalues. For example, an eigenvector with the largest absolute value represents the equilibrium state if the matrix is a time evolution operator, and the ground state if the matrix represents the time-independent Hamiltonian of quantum mechanics. If the matrix is a Markov transition matrix, the maximum eigenvalue absolute value is 1, and the second largest eigenvalue determines the rate of relaxation to the steady state [^ markov].
[^ markov]: I haven't been able to write the relationship between Markov transition matrix and linear algebra for several years, though I want to write it someday. If there is a strong request to "write!", I may be able to do my best.
Now, the eigenvalues and eigenvectors of a matrix can only be defined in a square matrix. However, there are times when you want to do something similar to a general rectangular matrix. Singular value decomposition is used in such cases.
Consider the matrix $ X $ in the $ m $ row $ n $ column ($ m <n $). Singular value decomposition is the decomposition of the matrix $ X $ as follows.
However, $ U $ is a square matrix of m rows and m columns, and $ V $ is a square matrix of n rows and n columns, both of which are unitary matrices. $ V ^ \ dagger $ is a conjugate matrix of $ V $. $ \ Sigma $ is a diagonal matrix of $ m $ rows and $ n $ columns, and diagonal elements are called $ X $ singular values. The singular value is a non-negative real number, and the number of elements is the one with the fewest rows and columns of $ X $, and $ m $ if $ m <n $. For convenience, $ \ Sigma $ are listed in descending order of singular value from the top.
The rank (rank) of a matrix is "the number of linearly independent vectors when the row vectors are considered to be aligned" or "the number of linearly independent vectors when the column vectors are considered to be aligned". is. Both definitions match. In a rectangular matrix with $ m $ rows and $ n $ columns ($ m <n $), there are $ n $ column vectors in the $ m $ dimension. Since a $ m $ dimensional space can be created if there are $ m $ of linearly independent vectors, the number of linearly independent column vectors cannot exceed $ m $. The same is true for $ m> n $. From the above, the rank of the matrix in the $ m $ row $ n $ column is $ \ min (m, n) $ at the maximum. Intuitively, you can see that the more linearly dependent vectors the matrix contains, the less "essential amount of information" the matrix has.
By the definition of matrix product, when the matrix of $ m $ row $ r $ column and the matrix of $ r $ row $ n $ column are multiplied, the middle $ r $ legs are crushed and $ m $ row $ It becomes a matrix of n $ columns. By reducing $ r $ from here, the matrix of $ m $ row $ n $ column can be approximated by the matrix of $ m $ row $ r $ column and the matrix of $ r $ row $ n $ column. I can.
The rank of the matrix is determined by the smaller of the rows and columns. Therefore, if $ r <m <n $, both the $ m $ row $ r $ matrix and the $ r $ row $ n $ column have a maximum rank of $ r $. Since the rank does not increase even if the matrix of rank $ r $ is multiplied, the rank of the matrix of $ m $ row $ n $ column called $ AB $ is also $ r $.
In this way, when approximating one matrix by the product of another small matrix, it is a low-rank approximation that approximates a matrix with a lower rank than the original matrix. There are various ways to create such a small matrix, but the best approximation in the sense of the Frobenius norm is the approximation using singular value decomposition.
Now, the singular value decomposition of the $ m $ row $ n $ matrix $ X $
Let's say you are getting. $ U $ and $ V $ are square matrices of $ m $ row $ m $ column and $ n $ row $ n $ column, respectively, both of which are unitary matrices. $ V ^ \ dagger $ is the conjugate matrix of $ V $ (transposed to the complex conjugate). $ \ Sigma $ is a diagonal matrix of $ m $ rows and $ n $ columns, with singular values lined up on the diagonal elements. At this time, arrange them in descending order from the top (decide that $ U $ and $ V $ also correspond).
Now, let's do a low-rank approximation using only $ r $ of singular values. This uses only the square matrix part of $ r $ row $ column $ from the upper left of $ \ Sigma $, $ r $ column vectors from the left, and $ V ^ \ dagger $ row vectors from the top. It is a form that approximates the original matrix of $ r $. Now, since the singular value is a non-negative real number and $ \ Sigma $ is a diagonal matrix, it can be separated from $ \ Sigma = \ sqrt {\ Sigma} \ sqrt {\ Sigma} $. Let's combine this with a matrix derived from $ U $ and a matrix derived from $ V ^ \ dagger $. It looks like this when illustrated.
From the above,
To get Thus, the $ m $ row $ n $ matrix $ X $ becomes the $ m $ row $ r $ column matrix $ A $ and the $ r $ row $ n $ column matrix $ B $ through singular value decomposition. I was able to approximate it as a product. The rank of the original matrix is up to $ \ min (m, n) $, but the rank of the matrix thus approximated is up to $ r $. The original $ m * n $ matrix elements are now $ r * (m + n) $. You can see that the information is compressed when $ r $ is small.
Recommended Posts