Optical Flow, the dynamics of images captured by OpenCV

I think that sometimes you can be lifted by being caught in a crayfish, but I would like to analyze how violent the movement was at that time! I don't think there are any cases.

image Source: ASIAN KUNG-FU GENERATION "To Your Town" ([Solfa Remake Memorial](http://ro69.jp/news/ detail / 13427)))

Optical Flow expresses the movement between images (?). Optical Flow describes how each point moves between two images. By calculating this, it is possible to analyze the movement of feature points on the image as shown in the figure above.

In this article, I would like to introduce the theoretical background for calculating the Optical Flow and the implementation using Python / OpenCV.

Positioning of Optical Flow

There are various purposes and methods for achieving this in the analysis of motion between images. Here, I will first explain how Optical Flow is positioned in it.

Actually, the barrier of classification is ambiguous, such as the flow estimation method being used for tracking, and it may be divided according to the method used, so it seems a little chaotic, but it is roughly divided in this way.

And, as the name suggests, Optical Flow introduced this time is a flow estimation.

Theory

Then, I will explain the theoretical contents of how to calculate Optical Flow.

It's easy to understand if you imagine a flip book, but in the end, a video is a continuous image.

image

Therefore, if you can analyze the movement between two images, it seems that you can connect them and clarify the movement of the entire video.

So how can you estimate the movement between two images when you have them?

The fact that there is movement means that one point on the image has moved to another. Conversely, if something other than the position (color, etc.) changes, there is no way to track it anymore, so the color (often converted to grayscale during processing, so "brightness") should be the same between the two images. , And make the assumption.

image

The formula will come out from here, but please be assured that it will proceed one by one.

Let $ I (x, y, t) $ be the brightness of the points $ x, y $ on the image $ I $ at a certain time $ t $. If the time advances by $ \ Delta t $ and the coordinates move $ \ Delta x, \ Delta y $ during that time, the brightness of the destination is $ I (x + \ Delta x, y + \ Delta y, t + \ Delta t) $. Since we made the assumption that these should not have changed, the following equation holds.

I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)

From this equation, it is now possible to finally derive $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {\ Delta t} $, which is the amount of change in coordinates per unit time. It will be the goal of the calculation performed from.

First, use the Taylor expansion to approximate the right-hand side. I feel that Taylor expansion uses some extremely difficult technique, but it is simply transformed to a value close to the true value. Please refer to here for a brief explanation.

I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t + ...

It will continue for a long time after this, but as it continues, it will become smaller and smaller, so I will take the plunge and cut it off to make only the first development.

I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t

What happens then is as follows.

I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t) = I(x, y, t) + \frac{\partial I}{\partial x} \Delta x + \frac{\partial I}{\partial y} \Delta y + \frac{\partial I}{\partial t} \Delta t

This will make $ \ frac {\ partial I} {\ partial x} \ Delta x + \ frac {\ partial I} {\ partial y} \ Delta y + \ frac {\ partial I} {\ partial t} \ Delta t If it is not = 0 $, the equations will not hold, so we will call them 0. Dividing this formula by $ \ Delta t $, that is, the amount of change over time, gives:

\frac{\partial I}{\partial x} \frac{\Delta x}{\Delta t} + \frac{\partial I}{\partial y} \frac{\Delta y}{{\Delta t}} + \frac{\partial I}{\partial t} = 0

For the sake of simplicity, the notation $ \ frac {\ partial I} {\ partial x} = I_x $ is as follows.

I_x \frac{\Delta x}{\Delta t} + I_y \frac{\Delta y}{{\Delta t}} + I_t = 0

Finally, $ \ frac {\ Delta x} {\ Delta t}, \ frac {\ Delta y} {{\ Delta t}} $ came out. Summarizing the movement amount of each of $ x and y $ per unit time in the form of vector $ \ bf {v} $, the following equation can be finally solved.

I_x {\bf v_x} + I_y {\bf v_y} + I_t = 0 \\\\ \nabla I^T {\bf v} = - I_t

However, from the conclusion, this formula cannot be solved as it is. I've been trying my best to follow the mathematical formulas so far, but I don't know! However, this is unavoidable because there is only one equation while there are two unknowns (two-dimensional) such as $ {\ bf v_x} $ and $ {\ bf v_y} $. This problem we face when seeking Optical Flow is called the Aperture Problem (why we say that is a lengthy explanation, but after all we need other constraints, nothing more than that. I haven't said it, so I'll omit it here).

Now, to solve this situation, we need to increase at least one constraint equation. This is the main subject, and the main theme of various methods for detecting motion is how to apply this "constraint", and various ideas have been proposed.

Here, I would like to introduce two methods, the Lucas–Kanade method, which is a representative of the sparse type, and the Horn–Schunck method, which is a dense type method.

Lucas–Kanade method

The Lucas–Kanade method assumes spatial consistency and constrains it. The point is to assume that the surrounding points should behave in the same way.

image

Assuming that the surrounding points are $ q_1 ... q_n $ and they all behave in the same way, we can inflate the constraint equation as follows:

I_x(q_1) {\bf v_x} + I_y(q_1) {\bf v_y} = - I_t(q_1) \\\\ I_x(q_2) {\bf v_x} + I_y(q_2) {\bf v_y} = - I_t(q_2) \\\\ ... \\\\ I_x(q_n) {\bf v_x} + I_y(q_n) {\bf v_y} = - I_t(q_n) \\\\

Here, each term is summarized in a matrix as follows.

A = \left[ \begin{array}{cc} I_x(q_1) & I_y(q_1) \\\\ I_x(q_2) & I_y(q_2) \\\\ \vdots & \vdots \\\\ I_x(q_n) & I_y(q_n) \end{array} \right] \\ b = \left[ \begin{array}{c} - I_t(q_1) \\\\ - I_t(q_2) \\\\ \vdots \\\\ - I_t(q_1) \end{array} \right]

Then you can simply write:

A{\bf v} = b

All you have to do is solve this system of equations. This can be solved using the least squares method(In short||b - A{\bf v}||^2Use differentiation to find the point that minimizes the squared error that can be expressed by)Finally, it can be expressed as follows.

{\bf v} = (A^TA)^{-1} A^Tb

To solve this equation, it is premised that there is an inverse matrix in $ A ^ TA $ (reversible) as shown in the equation, but Harris corner detection and Shi, which are typical methods for feature point detection, -It holds true in that it can be detected by Tomasi detection.

Horn–Schunck method

The Horn–Schunck method constrains the "smoothness" of movement. The point is that the one that travels the shortest distance is the better $ {\ bf v} $.

image

Let the cost (energy) of movement be $ E ({\ bf v}) $ and define it as follows. This is a cost, so the smaller it is, the better.

E({\bf v}) = E_{data}({\bf v}) + \alpha E_{smooth}({\bf v})

Here, $ E_ {data} ({\ bf v}) $ represents the first equation at hand. As mentioned above, $ I_x {\ bf v_x} + I_y {\ bf v_y} + I_t $ should be 0, so the closer it is to 0, the better $ {\ bf v} $. In the following, the integral is taken to add up the entire image.

E_{data}({\bf v}) = \int\int (I_x {\bf v_x} + I_y {\bf v_y} + I_t) ^2 dxdy

And $ E_ {smooth} ({\ bf v}) $ is the "smoothness" constraint.

E_{smooth}({\bf v}) = \int\int || \nabla {\bf v_x} || ^2 + || \nabla {\bf v_y} || ^2 dxdy

This means that the smaller the amount of change in $ {\ bf v} $, the better, that is, the better the one that "reaches the point where the brightness does not change with a small amount of change". $ \ Alpha $ is a factor in how much weight is placed on this smoothness constraint.

Solving this equation as a minimization problem will give you the desired $ {\ bf v} $. I will omit the subsequent development for solving, but this is the idea of the Horn–Schunck method.

Practical edition

Now, I would like to actually implement Optical Flow detection using OpenCV. We will use the Lucas–Kanade and Gunnar Farneback methods for which implementations are provided. The Gunnar Farneback method is a commonly used method of the dense type (constraints use neighboring points like the Lucas–Kanade method ...).

The sample actually implemented is placed below, but from here on, Domo-kun in NHK's CREATIVE LIBRARY as a material due to copyright issues. using.

icoxfog417/cv_tutorial/opticalflow

I'm using Python3.5, numpy1.10.4, and OpenCV3.1.0 (it's easier with Anaconda / Miniconda). However, only Matplotlib uses the previous 1.4.3. This is due to a bug in the execution of nbagg used for animation playback in the latest version 1.5.1 as of April 2016 (notebook backend figures close spontaneously # 6075. matplotlib / matplotlib / issues / 6075)).

As for the implementation, the basics are as in the OpenCV tutorial. First, from the Lucas–Kanade method.

domo_running_snap.PNG

You're running. .. .. You can see that the trajectory can be tracked. The Lucas–Kanade method detects feature points and estimates the optical flow from points around them. Therefore, the important processing is the following two points.

Note that feature points may not be detected, of course (if not detected, None will be returned). Also, anything that cuts into the video from the middle was not detected. Keep in mind that what you track must be visible from the beginning (I've tried a lot, but I'm not sure why).

If you find a dense Optical Flow by the Gunnar Farneback method, you can capture the whole movement more like the following.

image

As an aside, the VideoCapture used to load the video can also pass the image file. I can do it. For example, if you specify ʻimg_% 02d.jpg, the files with serial numbers such as ʻimg_01.jpg, ʻimg_02.jpg` ... will be read automatically. You can easily make flipbooks with this, so it's fun to give it a try.

That is all for the explanation. Please try to capture the memorable movements!

References

Recommended Posts

Optical Flow, the dynamics of images captured by OpenCV
I compared the identity of the images by Hu moment
The story of displaying images with OpenCV or PIL (only)
What I did when I couldn't find the feature point with the optical flow of opencv and when I lost it
Low-rank approximation of images by HOSVD step by step
Low-rank approximation of images by Tucker decomposition
I checked the options of copyMakeBorder of OpenCV
[OpenCV] About the array returned by imread
Sample to comprehensively try OpenCV Optical Flow
Face detection by collecting images of Angers.
Pandas of the beginner, by the beginner, for the beginner [Python]
Reconstruction of moving images by Autoencoder using 3D-CNN
Low-rank approximation of images by singular value decomposition
Sort the elements of the array by specifying the conditions
Low-rank approximation of images by HOSVD and HOOI
Minimize the number of polishings by combinatorial optimization
Judging the finish of mahjong by combinatorial optimization
Classification of guitar images by machine learning Part 2
Search by the value of the instance in the list
Wavelet transform of images with PyWavelets and OpenCV
I tried using the image filter of OpenCV
[Python + OpenCV] Whiten the transparent part of the image