Try to decompose the daimyo procession into Tucker

Introduction

Linear algebra has an operation called singular value decomposition, which is used to approximate matrices. For details, refer to the above "Singular value decomposition of daimyo matrix".

Now, a matrix is like a two-dimensional array in a program. You can get the value of a two-dimensional array by specifying two indexes. Similarly, a matrix can also specify an element by specifying two indexes, a row and a column. A tensor is an extension of a matrix to a multidimensional array. It's a lot of trouble to discuss tensors seriously, but in this article, it's okay to just think of it as a "multidimensional array".

Now, like matrices, there is a need to approximate tensors. In the case of matrices, the singular value decomposition was the best approximation (in the sense of the Frobenius norm), but it is not well understood how to approximate a general tensor [^ 1]. The decomposition that divides a tensor into a small tensor (core tensor) and a set of matrices corresponding to each mode is called Tucker decomposition. In the above, I thought that the monochrome image was a matrix as it was and decomposed it into singular values, but in this article, I think that the color image is a third-order tensor and decompose it by Tucker. Several algorithms for obtaining Tucker decomposition have been proposed, but in this article, we will introduce the simplest algorithm called Higher Order SVD (HOSVD), which uses singular value decomposition obediently.

[^ 1]: Maybe there is a general theory, but I don't know.

TODO: Upload code to GitHub

Tucker disassembly

Before explaining the Tucker decomposition, I will summarize the terms, matrices, and graph representations of tensors used in this paper [^ 2]. The index of a matrix or tensor is called a "foot". The procession has two legs. This number of rows and columns is called the "dimension" of the foot. Note that in multidimensional arrays, the "number of legs" is called the dimension, but in matrices and tensors, the "type of values that can be indexed" is called the dimension.

[^ 2]: I'm not very familiar with it, so please be aware that the following may have non-industry notation.

Matrix and tensor are represented by "squares" with "lines". The "line" is the foot. The procession has two legs, and the tensor on the third floor has three legs. If you want to clarify the dimension of each foot, write it near it.

By the way, the most important thing in graph representation is the operation of connecting lines. A matrix can be created as a new matrix by summing the legs of the same dimension. For example, a matrix can be a matrix product of an m-by-n matrix and an n-by-l matrix, resulting in an m-by-l matrix. Expressing this with squares and lines looks like this.

image.png

Even with tensors, you can crush the legs to create a new tensor by summing the legs with the same dimension. In a graph representation, you connect your legs like a matrix. The resulting tensor is a tensor with the dimension of each leg, with the number of remaining legs as the rank.

Now, color image data can be represented by a three-dimensional array with three indexes: height, width, and color. Think of this as a three-legged tensor and approximate it by singular value decomposition. Generally, the Tucker decomposition approximates all feet, but this time, since there are only three dimensions of the "color" foot, we will approximate only the "height" and "width" feet.

image.png

Originally, a tensor with three legs of height h, width w, and color c is divided into a product of one tensor and two matrices. The central tensor is called the "core tensor", and in this case it has colored legs and legs connected to the left and right matrices by dimension r. The left and right matrices have a foot of height h and a foot of dimension r, a foot of width w and a foot of dimension r, respectively, and r is a foot that "transmits information". Tucker decomposition compresses information by reducing the dimension of r. Let's try it below.

Make an image of the daimyo procession

First, let's create an image of the target daimyo procession. Last time I made it in monochrome, but this time I will make a color image. To make it easier to see the color "bleeding" due to approximation, the four letters are yellow (R, G, 0), cyan (0, G, B), magenta (R, 0, B), and white (R, G), respectively. Let's color with, B). Is it like this? The font is installed for Google Colab on the way and the font path is specified, but if you use another environment, please correct it accordingly.

from PIL import Image, ImageDraw, ImageFont
import numpy as np
from scipy import linalg
from IPython.display import display
!apt-get -y install fonts-ipafont-gothic
fpath='/usr/share/fonts/opentype/ipafont-gothic/ipagp.ttf'
fontsize = 50
font = ImageFont.truetype(fpath, fontsize)
LX = 200
LY = fontsize
img  = Image.new('RGB', (LX,LY),color="black")
draw = ImageDraw.Draw(img)
draw.text((0,0), "Big", fill=(255,255,0), font=font)
draw.text((0,0), "Name", fill=(0,255,255), font=font)
draw.text((0,0), "line", fill=(255,0,255), font=font)
draw.text((0,0), "Column", fill="white", font=font)
img

When executed, such an image will be created.

image.png

Let's get the data from the image.

data = np.array(img.getdata()).reshape(LY,LX,3)

You can think of data as a three-dimensional array, a tensor with three legs, each with three dimensions: height LY, width LX, and color. The task of this article is to approximate this. Before that, let's take a look at each of the RGB planes.

R = data[:,:,0]
G = data[:,:,1]
B = data[:,:,2]
display(Image.fromarray(np.uint8(R)))
display(Image.fromarray(np.uint8(G)))
display(Image.fromarray(np.uint8(B)))

image.png

If you look at the world of R only, the world of G only, and the world of B only, one letter is missing from each. Also, since the "column" is white, it will not be chipped.

Singular value decomposition

First, let's create a function that decomposes singular values. See above for details. A function that takes a matrix and returns two matrices approximated by a given rank would look like this:

def perform_svd(X, rank):
  U, s, V = linalg.svd(X)
  Ur = U[:, :rank]
  Sr = np.diag(np.sqrt(s[:rank]))
  Vr = V[:rank, :]
  A = Ur @ Sr
  B = Sr @ Vr
  return A, B

If you insert the matrix X, two matrices ʻA, B will be returned. By calculating ʻA @ B, you can get a low-rank approximation of X by rank.

Tensor approximation

Now, let's approximate the tensor, but since it's a big deal, let's try various approximation methods.

Singular value decomposition thinking that it is a matrix forcibly

This tensor has a height of 50, a width of 200, and a color of 3. The number of elements is 30,000. Ignore the original structure altogether, think of it as a 200 * 150 matrix, decompose it into singular values, and approximate it. Will it be such a function?

def simple_image(X, rank):
  X = data.reshape((200,150))
  A, B = perform_svd(X, rank)
  Y = (A @ B).reshape(LY,LX,3)
  Y = np.clip(Y, 0, 255)
  Y = np.uint8(Y)
  print((A.size+B.size)/X.size)
  return Image.fromarray(Y)

The execution result looks like this.

display(simple_image(data, 50))
display(simple_image(data, 21))
display(simple_image(data, 8))

image.png

The data compression ratio is also displayed. From the top are 58.3%, 24.5% and 9.3%. This is the sum of the number of elements of A and B divided by the number of elements of X when the matrix X of 200 * 150 is divided into two matrices A and B. In spite of the very rough method, the original structure remains as it is.

Singular value decomposition of RGB plane

The previous method originally completely ignored the three structures of height, width and color. Now let's consider the original structure a bit. A simple way to approximate a tensor that represents a color image is to think of the R, G, and B planes as "matrix", decompose them into singular values, and then combine them. let's do it.

def rgb_svd(X, rank):
  R = X[:,:,0]
  G = X[:,:,1]
  B = X[:,:,2]
  Rh, Rw = perform_svd(R, rank)
  Gh, Gw = perform_svd(G, rank)
  Bh, Bw = perform_svd(B, rank)
  return Rh, Rw, Gh, Gw, Bh, Bw

The tensor X is separated into a matrix representing the RGB plane, R is separated into Rh and Rw, G is separated into Gh and Gw, and B is separated into Bh and Bw, respectively. If you do Rh @ Rw, the R plane will be restored, so if you restore the G and B planes in the same way, the original image will be restored. Image restoration will look like this.

def rgb_image(X, rank):
  Rh, Rw, Gh, Gw, Bh, Bw = rgb_svd(X, rank)
  R = Rh @ Rw
  G = Gh @ Gw
  B = Bh @ Bw
  Y = np.asarray([R,G,B]).transpose(1,2,0)
  Y = np.clip(Y, 0, 255)
  Y = np.uint8(Y)
  print((Rh.size+Rw.size)*3/X.size)
  return Image.fromarray(Y)

Let's run it.

display(rgb_image(data, 50))
display(rgb_image(data, 10))
display(rgb_image(data, 4))

image.png

The compression ratios are 125%, 25%, and 10% from the top. Singular value decomposition may have more elements than the original data if the rank value to be left is large. For the other two, I tried to match the compression ratio with the rough method I mentioned earlier. I thought that the approximation would be better because the structure was taken into consideration, but it was unexpectedly bad.

Tucker disassembly

Now let's break down the main subject of Tucker. See also Low-rank approximation of images with Tucker decomposition for the principle of Tucker decomposition. This function divides the three-legged tensor X into the core tensor C and the left and right matrices ʻU and V`.

def tucker_decomposition(X, rank):
  X = X.transpose(0,2,1)
  XR = X.reshape(LY*3, LX)
  _, _, V = linalg.svd(XR)
  V = V[:rank,:]
  Vt = V.transpose(1,0)
  XL = X.reshape(LY, LX*3)
  U, _, _ = linalg.svd(XL)
  U = U[:,:rank]
  Ut = U.transpose(1,0)
  # Make a core tensor
  UX = np.tensordot(Ut, X, (1,0))
  C = np.tensordot(UX, Vt, (2, 0))
  return U, C, V

A function that restores an image from a disassembled matrix and tensor looks like this.

def tucker_image(X, rank):
  U, C, V = tucker_decomposition(X, rank)
  UC = np.tensordot(U,C,(1,0))
  Y = np.tensordot(UC, V, (2,0))
  Y = Y.transpose((0,2,1))
  Y = np.clip(Y, 0, 255)
  Y = np.uint8(Y)
  print((U.size+C.size+V.size)/X.size)
  return Image.fromarray(Y)

Let's run it.

display(tucker_image(data, 50))
display(tucker_image(data, 23))
display(tucker_image(data, 10))

image.png

The compression ratios are 66.7%, 24.5%, and 9.3% from the top. It's clearly better than singular value decomposition for each of RGB, but it seems that the comparison with the rough method is subtle.

Summary

I thought that the image of the color "Daimyo Matrix" was a three-legged tensor, and tried to compress the information using singular value decomposition. It was surprising that it would be better to forcibly decompose the entire RGB plane into singular values than to decompose each RGB plane into singular values. Unlike the case of matrices, the tensor approximation is not very trivial as to what to do. I hope you'll try this code and think, "I see, the low-rank approximation of tensors is deep."

Supplementary material

TODO: Write a commentary on the Tucker decomposition algorithm in this article.

Recommended Posts

Try to decompose the daimyo procession into Tucker
Try to introduce the theme to Pelican
Cython to try in the shortest
The fastest way to try EfficientNet
The easiest way to try PyQtGraph
Try to face the integration by parts
How to decompose TTC fonts into TTFs.
Python amateurs try to summarize the list ①
Try to solve the fizzbuzz problem with Keras
Try adding fisheye lens distortion to the image
Try to solve the Python class inheritance problem
Try to solve the man-machine chart with Python
How to try the friends-of-friends algorithm with pyfof
Try singular value decomposition of the daimyo matrix
Try to simulate the movement of the solar system
Try posting to Qiita for the first time
Try to solve the programming challenge book with python3
[Python] Try to read the cool answer to the FizzBuzz problem
Try setting SSH (Exscript) from the software to the router
Try setting NETCONF (ncclient) from software to the router
Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)
Try to visualize the room with Raspberry Pi, part 1
Try to solve the internship assignment problem with Python
Molecular dynamics simulation to try for the time being
Try to estimate the number of likes on Twitter
Try to get the contents of Word with Golang
Try to divide twitter data into SPAM and HAM
[Neo4J] ④ Try to handle the graph structure with Cypher
I tried to put pytest into the actual battle
Try to decipher the login data stored in Firefox
Try to specify the axis with PyTorch's Softmax function
Try translating the Python Data Science Handbook into Japanese