Let's compress the image using ** K-Means **, which is a typical clustering algorithm. First, I will explain the K-Means algorithm. After that, I will explain about image compression using K-Means. For the contents, refer to Coursera Machine Learning.
What is clustering?
Divide a collection of data into several groups (clusters) according to the similarity between the data.
The K-Means algorithm can be roughly divided into the following three processes:
Let's understand each step with an image.
First, ** determine the position of a point called the center of gravity **. Each centroid is considered a standard pattern for each cluster. Therefore, the center of gravity of the number of clusters for which you want to divide the data is required. In the figure below, the star symbol represents the center of gravity and the circle symbol represents the data. Since the number of center of gravity is three here, the data is ** red ** </ font>, ** blue ** </ font>, ** green ** </ font> It will be divided into 3 clusters.
In this figure, the center of gravity is randomly determined, but we will introduce another method later.
Once you have determined the location of the centroid, the next step is to ** assign a cluster to each data **. Here, look at each data, ** red ** </ font>, ** blue ** </ font>, <font color Assign data to one of the clusters, depending on which of the cluster centroids of = "green"> ** green ** </ font> is close to.
After assigning the cluster to all the data, ** recalculate the position of the centroid **. At this time, the average value of the data in each cluster is set as the position of the new center of gravity. For example, to calculate the center of gravity of ** green ** </ font>, add all the ** green ** </ font> circles. In addition, the center of gravity is calculated by dividing by the number. Comparing the upper and lower figures, you can see that the position of the center of gravity has changed. Especially, you can see that ** blue ** </ font> and ** green ** </ font> are moving.
After that, the cluster allocation and the recalculation of the center of gravity are ** repeated **. You can check the situation with the animation below. Here, the color of the center of gravity is black for easy viewing. At the end, you can see that it has converged and the position of the center of gravity has not changed.
Also, if you look at the still image, you can see that the center of gravity moves in the following trajectory.
The K-Means algorithm described above can be summarized in code as follows.
#Center of gravity initialization step
centroids = kmeans_init_centroids(X, K)
for iter in range(iterations):
#Cluster allocation steps:Assign the cluster to which the centroid closest to each data belongs.
#idx is a list, idx[i]Stores the index of the center of gravity assigned to the i-th data
idx = find_closest_centroids(X, centroids)
#Center of gravity movement step:Calculate the average in the cluster based on the centroid assignment
centroids = compute_centroids(X, idx, K)
There are two inputs to the algorithm:
The training set is unlabeled because K-Means is unsupervised learning. Each training data $ x ^ {(i)} $ is a $ n $ dimension vector. Each corresponds to the circle in the figure. In addition, the number of clusters must be determined in advance.
Now that we know the input data, we will explain each step of the algorithm.
First, initialize the center of gravity. The center of gravity is a $ n $ dimension vector represented by $ u ^ {(j)} $, and there are $ K $. In the figure, it corresponds to the point marked with a star.
u^{(1)}, u^{(2)},...,u^{(K)} \in \mathbb{R}^n
The method of initializing the center of gravity is from the training set $ x ^ {(1)}, x ^ {(2)}, ..., x ^ {(m)} \ in \ mathbb {R} ^ n $. There is a method to randomly select $ K $ pieces. Specifically, after randomly shuffling the training set, select the first $ K $ pieces as the center of gravity.
When implemented as code, it looks like this:
from numpy import *
def kmeans_init_centroids(X, K):
#Initialize the center of gravity so that it becomes random data
#Randomly sort training data
rand_X = random.permutation(X)
#Adopt the first K as the center of gravity
centroids = rand_X[:K]
return centroids
After the centroid is initialized, assign a cluster to each data. Specifically, find the centroid $ u_j $ that is closest to each training data $ x ^ {(i)} $ and assign a cluster of that centroid.
To find the center of gravity closest to the data, find the square of the distance as follows: Allocate a cluster of centroids that minimizes the square of the distance.
idx^{(i)} := argmin_j ||x^{(i)} − μ_j||^2
Where $ idx ^ {(i)} $ is the index of the centroid closest to $ x ^ {(i)} $ and $ u_j $ is the coordinates of the $ j $ th centroid.
A naive implementation using numpy looks like this:
import sys
from numpy import *
def find_closest_centroids(X, centroids):
K = centroids.shape[0]
m = X.shape[0]
idx = zeros((m, 1))
for i in range(m):
lowest = sys.maxint
lowest_index = 0
for j in range(K):
cost = X[i] - centroids[j]
cost = cost.T.dot(cost)
if cost < lowest:
lowest_index = j
lowest = cost
idx[i] = lowest_index
return idx
Using scipy's scipy.spatial.distance.cdist, you can write it short as follows. Various distance calculation methods are defined in cdist. Among them, this time I used the squared distance. Performance is also better here than the naive code above.
import scipy.spatial.distance
from numpy import *
def find_closest_centroids(X, centroids):
sqdists = scipy.spatial.distance.cdist(centroids, X, 'sqeuclidean') #Find the squared distance
idx = argmin(sqdists, axis=0) #Index of the closest centroid for each data
return idx
After allocating the data to the cluster, recalculate the centroid.
Once all the data has been assigned to the cluster, the next step is to recalculate the centroid. Here we calculate the average of the data assigned to the cluster. For all centroids $ k $, execute the following formula.
u_k := \frac{1}{|C_k|} \sum_{i \in C_k} x^{(i)}
Where $ C_k $ is the data set with the centroid $ k $ assigned. Specifically, if the two data $ x ^ {(3)} $ and $ x ^ {(5)} $ are assigned to the centroid $ k = 2 $, then $ u_2 = \ frac {1} {2 } (x ^ {(3)} + x ^ {(5)}) $.
The simple implementation is as follows.
from numpy import *
def compute_centroids(X, idx, K):
m, n = X.shape
centroids = zeros((K, n))
for k in range(K):
temp = X[idx == k] #Fetch data assigned to cluster k
count = temp.shape[0] #Number of data allocated to cluster k
for j in range(n):
centroids[k, j] = sum(temp[:, j]) / count
return centroids
If you write it short using the mean function of numpy, it will look like this.
from numpy import *
def compute_centroids(X, idx, K):
m, n = X.shape
centroids = zeros((K, n))
for k in range(K):
centroids[k] = mean(X[idx == k], axis=0)
return centroids
Now that the explanation of the algorithm is over, let's move on to the main subject of image compression. Images can also be compressed by applying the K-Means algorithm. Roughly speaking, use K-Means to select the $ K $ color used to represent the compressed image. It compresses by replacing the pixels of the original image with this $ K $ color. In the following, it will be explained as $ K = 16 $.
If the pixels of the image are 24-bit color representations, they can be decomposed into three 8-bit representations. Each 8bit is ** red ** </ font>, ** blue ** </ font>, ** It corresponds to green ** </ font> and is so-called RGB. Since it is 8 bits, it can take values from 0 to 255 ($ 2 ^ 0 to 2 ^ 8-1 $) respectively. That's why the combination of these values causes the image to contain a myriad of colors, but here we reduce that number to $ K = 16 $ colors. By reducing the number of colors to 16 colors, each pixel should be able to be represented by 4 bits. ($ 16 = 2 ^ 4 $)
The figure below shows that when the RGB values of each pixel contained in the left image are mapped in three-dimensional space, the image on the right is obtained.
By using the K-Means algorithm, we select the 16 colors used to represent the compressed image. Specifically, each pixel of the original image is input as data and clustered into 16-color groups using the K-Means algorithm. Once the centroid is calculated, the image can be represented in 16 colors by replacing the RGB values of the pixels in the original image with the RGB values of the centroid.
The image is as shown in the figure below. In the figure below, the star symbol represents the center of gravity, the circle symbol represents each pixel, and the pixels belonging to the same cluster have the same color. At this time, the value of the pixel belonging to the same cluster is replaced with the value of the center of gravity of that cluster. Speaking of the image, it is an image that the values of the circles of the same color are aggregated into the value of the center of gravity of the cluster. For convenience of calculation, the RGB values have been converted from 0.0 to 1.0.
The animation below shows how K-Means is actually applied to the image. You can see that the cluster allocation and the recalculation of the center of gravity are repeated.
The original image and the compressed image to which K-Means is actually applied. The left is the original image and the right is the compressed image. You can see that the compressed image has a rougher representation with fewer colors.
Once you've found the 16 representative colors that represent your image, replace the value for each pixel with the value for the closest centroid. This is equivalent to representing the original image with the centroid assigned to each pixel. Since there are only 16 centers of gravity, the number of bits required to represent the image is significantly reduced. Let's see the number of bits required for the original image and the compressed image.
The original image used this time is 128 x 128 pixels in size. We needed 24 bits of information for each pixel. As a result, in total
128 x 128 x 24 = 393216 bit
Was needed.
The compressed image requires the number of bits to store 16 color information. Also, each color requires 24 bits, but the image itself requires only 4 bits per pixel. This is because it is only necessary to show which of the 16 colors is used in 4 bits. As a result, in the end
16 x 24 + 128 x 128 x 4 = 65920 bit
You will be able to do it. This is about one-sixth the size of the original image.
This time, I tried to compress the image using the K-Means algorithm. The image of the compressed result is not good, but I think it is an algorithm with a wide range of application. Please try it together with the sample code.
Recommended Posts