Optimal placement of multiple images

Last time, I wrote an article Collage template automatic generation . Collage is an application example, and it is more correct to say "a method of probabilistically dividing one area into n areas".

This time, it's the opposite of this, and when n rectangles (for example, images) are given, a method of arranging them nicely in the area of $ h \ times w $ (hereinafter referred to as canvas) is used. Let's think about it.

Now, again, we will use a probabilistic method. A rectangle can be represented by a uniform distribution. For example, the uniform distribution representing the $ 1 \ times 1 $ canvas is shown in the figure below.

Since the rectangles to be arranged can also be expressed by a uniform distribution, I feel that it is good to match the uniform distribution representing the canvas with the mixture of the uniform distribution representing the rectangles to be arranged. Divergence can be used to measure things like the distance between probability density functions. This time, KL divergence is adopted. In this way, the mixture is maximum likelihood estimated.

Now, I wrote that rectangles can be represented by a uniform distribution, but this is not very convenient for optimization. So, I will try to approximate it with Gaussian. Let the rectangles to be placed be $ \ {(h_i, w_i) \} _ {i = 1} ^ n $. However, it is assumed that $ h_i and w_i $ represent the vertical width and the horizontal width of the rectangle $ i $, respectively.

Let the density function of the probability distribution representing the rectangle $ i $ be $ N (x; \ mu_i, \ Sigma_i) $ (N means normal distribution). Here, $ \ Sigma_i $ is defined as follows.

\Sigma_i = \left[\begin{array}{ll}
\frac{h_i}{2} & 0 \\
0 & \frac{w_i}{2}
\end{array}\right]

This distribution now approximates the image. All you have to do is estimate $ \ mu_i $. Let $ p (x) $ be the probability density function of the distribution representing the canvas, and $ q (x) = \ sum_ {i = 1} ^ n w_i N (x; \ mu_i, \ Sigma_i) $ ， However, let $ \ sum_ {i = 1} ^ n w_i = 1 $. This time class prior is uniform, that is, $ w_i = \ frac {1} {n} $.

Now, the KL divergence between $ p and q $ is expressed as follows.

\begin{align*}
KL(p||q) &= \int p(x) \log \frac{p(x)}{q(x)} dx \\
&\approx - \frac{1}{m} \sum_{j=1}^m \log q(x_j)
\end{align*}

Finally, the sample was approximated as $ x_j \ sim p (x) $. This is minimized by imposing the constraint that the rectangle does not protrude and does not cover. Use scipy.optimize.minimize for optimization. Maybe projected gradient descent.

The experimental results are as follows. The first figure is a properly arranged rectangle given, and I will do my best to reproduce it. The following figure is the placement result. You can see that it is arranged well.

Next time, in the template generation of the previous article, I would like to generate templates and implement high-speed matching with a given image list.

import itertools
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats, optimize

def generate_template(n, width, height, random_state=1, max_random_state=10000, offset=0):
    L = [np.array([offset, offset, width-offset, height-offset])]
    random_state_lists = stats.randint.rvs(0, max_random_state, size=(n-1, 4), random_state=random_state)

    for random_state_list in random_state_lists:
        n_areas = len(L)
        if n_areas == 1:
            i = 0
        else:
            p = np.repeat(1 / (n_areas + i), n_areas)
            x = stats.multinomial.rvs(1, p, size=1, random_state=random_state_list[0])[0]
            i = x.argmax()

        y = stats.bernoulli.rvs(0.5, size=1, random_state=random_state_list[1])[0]
        if y == 0:
            b = stats.uniform.rvs(L[i][0], L[i][2] - L[i][0], size=1, random_state=random_state_list[2])[0]
            #b = stats.uniform.rvs(L[i][0], L[i][2] - L[i][0], size=1, random_state=random_state_list[2])[0]
        else:
            b = stats.uniform.rvs(L[i][1], L[i][3] - L[i][1], size=1, random_state=random_state_list[3])[0]
            #b = stats.uniform.rvs(L[i][1], L[i][3] - L[i][1], size=1, random_state=random_state_list[3])[0]
        if y == 0:
            area1 = np.array([L[i][0], L[i][1], b-offset/2, L[i][3]])
            area2 = np.array([b+offset/2, L[i][1], L[i][2], L[i][3]])
        else:
            area1 = np.array([L[i][0], L[i][1], L[i][2], b-offset/2])
            area2 = np.array([L[i][0], b+offset/2, L[i][2], L[i][3]])
        L.pop(i)
        L.append(area1)
        L.append(area2)
    return L


def negative_log_likelihood(L, X, Z):
    n_components = X.shape[0]
    L = L.reshape((n_components, 2))
    mixture_rate = np.repeat(1/n_components, n_components)
    values = np.zeros((Z.shape[0], n_components))
    for i in range(n_components):
        q = stats.multivariate_normal(L[i]+X[i]/2, [[X[i][0]/2, 0], [0, X[i][1]/2]])
        values[:, i] = q.pdf(Z) * mixture_rate[i]
    tmp = values.sum(axis=1)
    return - np.log(tmp[tmp!=0]).sum()


def compute_intersected_area(p1, p2, x1, x2):
    left = np.max([p1[0], p2[0]])
    right = np.min([p1[0] + x1[0], p2[0] + x2[0]])
    bottom = np.max([p1[1] + x1[1], p2[1] + x2[1]])
    top = np.min([p1[1], p2[1]])
    return np.max([0, (right - left) * (bottom - top)])


def constraint(L, X):
    P = L.reshape(X.shape)
    intersected_area = 0.0
    for i, j in itertools.combinations(range(X.shape[0]), 2):
        #intersected_area += compute_intersected_area(P[i], P[j], X[i]*s[i], X[j]*s[j])
        intersected_area += compute_intersected_area(P[i], P[j], X[i], X[j])
    return intersected_area


def estimate(X, w, h, scale=2, random_state=1):
    r = np.random.RandomState(random_state)
    Ls = r.uniform(0, 1, size=(20, X.shape[0]*2))

    px = stats.uniform(0, 1)
    py = stats.uniform(0, 1)
    pxy = lambda xy: px.pdf(xy[:, 0]) * py.pdf(xy[:, 1])

    n_samples = 500

    Z = np.c_[px.rvs(n_samples, random_state=rs), py.rvs(n_samples, random_state=rs+1)]

    constraints = ({"type": "eq", "fun": constraint, "args": (X,)})
    #bounds = [tmp for i in range(X.shape[0]) for tmp in [(0, scale-X[i][0]), (0, scale-X[i][1])]]
    bounds = [tmp for i in range(X.shape[0]) for tmp in [(0, w*scale-X[i][0]), (0, h*scale-X[i][1])]]
    lowest_loss = np.inf
    best_result = None
    for L in Ls:
        result = optimize.minimize(
            negative_log_likelihood,
            L.ravel(),
            args=(X, Z),
            bounds=bounds,
            options={"disp": True, "maxiter": 500},
            constraints=constraints,
        )
        if result.status != 0:
            continue
        loss = negative_log_likelihood(result.x.reshape(X.shape), X, Z)
        if loss < lowest_loss:
            best_result = result
            lowest_loss = loss
    result = best_result
    return result.x.reshape(X.shape)


def plot_result(L_pred, X, n_samples=100, colors=["b", "g", "r", "c", "m", "y", "k"]):
    n = L_pred.shape[0]
    for i in range(n):
        Z_i = np.c_[stats.uniform.rvs(L_pred[i][0], X[i][0], size=n_samples),
                    stats.uniform.rvs(L_pred[i][1], X[i][1], size=n_samples)]
        print(Z_i.max())
        plt.scatter(Z_i[:, 0], Z_i[:, 1], c=colors[i])
    plt.show()

rs = 1
width, height = 1, 1

L = generate_template(2, width, height, random_state=3, offset=0)
L = np.array(L)
X = np.c_[L[:, 2] - L[:, 0], L[:, 3] - L[:, 1]]


L_pred = estimate(X, width, height, scale=2)

tmp = np.maximum((L_pred + X).max(axis=0), 1)
plot_result(L_pred / tmp, X / tmp, n_samples=500)