Earlier, I wrote Articles like this (let's recognize red objects with python). It is a method of simply converting an image to HSV (Hue, Saturation, Value) and finding a region with a strong red component.
This time, let's consider a "green" object as an application. And then I'll talk about "tracking" an object in a continuous video stream. If you look closely at a continuous video stream, it is a continuous still image. If you continue to recognize the green object from this still image, you can track it naturally. It is not a big mistake to think that. However, there is a camera that is different from the human eye. If the angle of the target object deviates a little, it will glow white depending on the amount of light and deviate from the judgment (judgment fails and you lose sight of it), the judgment point flies around, or you lose sight of it even if the ambient light changes a little. It is. What happens if bad conditions happen to overlap and you lose sight of it? The computer will determine that the object is "absent." But in the next moment, the conditions may improve and reappear. It appears and disappears, it appears again, and there is no sense of stability at all, and 100% detection accuracy cannot always be expected.
Therefore, it is necessary to abandon the idea that a still image should be analyzed each time even if it is a video stream, and reconsider the video stream as a "continuous still image". In other words, it should be predicted and searched by making full use of the idea of probability and statistics that "it should be around here because it was here last time".
There are good ways to solve these problems. It's called a "particle filter". I don't know enough to explain the academic definition and contents of particle filters, so I'd like to leave it to many other commentary pages. However, conceptually, image analysis using a particle filter
If you continue these procedures 1 to 5, you will naturally be able to achieve the above. The state of "I was around here last time" is remembered by the scattered particles, and for each particle, "Is it this side next?" And repeats moving the particles randomly. Poor particles that continue to hurt their predictions are eliminated. Particles desperately predict the next frame and survive. W It feels unexpectedly cruel, and it seems that even the attachment that particles are difficult (lie).
It is quite difficult to describe these series of processes in C / C ++, but mainly due to the power of numpy, it is a great tokoro of python + numpy that a small amount of code is required.
First of all, about import. It is assumed that you have imported it as follows.
import cv2
import numpy as np
First, define the structure of the particles. Let's do as follows. x
and y
are the positions of the particles, and weight
is the likelihood (weight).
particle = [x, y, weight]
Next, create a function to calculate the likelihood (weight) of particles.
This function scans a range of 30x30 pixels around the specified coordinates and returns the percentage of 900 pixels that meet the criteria. For example, if the range of 30x30 pixels is all NG, a value of 0.0, if all are OK, a value of 1.0, and if about half is OK, a value of about 0.5 is returned.
The decision function func ()
is out and allows the caller to specify the decision function.
def likelihood(x, y, func, image, w=30, h=30):
x1 = max(0, x - w / 2)
y1 = max(0, y - h / 2)
x2 = min(image.shape[1], x + w / 2)
y2 = min(image.shape[0], y + h / 2)
region = image[y1:y2, x1:x2]
count = region[func(region)].size
return (float(count) / image.size) if count > 0 else 0.0001
Next is the particle initialization function.
The func
parameter is supposed to specify the decision function to go out as before.
This function creates a lot (500) of these particles.
As an initial value, the particles are positioned near the largest area in the area determined by the func ()
function.
The 500
of(500, 3)
specified in the call ofnp.ndarray ()
is the number of particles, and 3
is the number of elements of the above x
, y
, weight
. is.
def init_particles(func, image):
mask = image.copy()
mask[func(mask) == False] = 0
contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
if len(contours) <= 0:
return None
max_contour = max(contours, key=cv2.contourArea)
max_rect = np.array(cv2.boundingRect(max_contour))
max_rect = max_rect[:2] + max_rect[2:] / 2
weight = likelihood(max_rect[0], max_rect[1], func, image)
particles = np.ndarray((500, 3), dtype=np.float32)
particles[:] = [max_rect[0], max_rect[1], weight]
return particles
Here is the real movement of the particle filter. The particle filter goes through the following four procedures.
In resampling, we use random numbers to remove particles with poor grades and replace them with particles with good grades.
The method cumsum ()
used to create the weights
array calculates the cumulative sum.
Also, by setting (weights> weight) .argmax ()
, it will scan the weights
array and return the array index when a value larger than weight
appears for the first time.
def resample(particles):
tmp_particles = particles.copy()
weights = particles[:, 2].cumsum()
last_weight = weights[weights.shape[0] - 1]
for i in xrange(particles.shape[0]):
weight = np.random.rand() * last_weight
particles[i] = tmp_particles[(weights > weight).argmax()]
particles[i][2] = 1.0
Actually move the particles towards the next frame.
Add the coefficient specified by variance
multiplied by the result ofnumpy.random.randn ()
.
This variance
coefficient is a numerical value that should be set according to the intensity of the movement of the target.
Random movement of particles is called "prediction". Particles that happen to move in the right direction survive, and particles that happen to move in the wrong direction are destined to be eliminated (overwritten).
def predict(particles, variance=13.0):
particles[:, 0] += np.random.randn((particles.shape[0])) * variance
particles[:, 1] += np.random.randn((particles.shape[0])) * variance
Determines the likelihood (weight) of each particle. It will be the material to judge the result of the previous prediction. This likelihood (weight) is calculated by calling the likelihood ()
function created earlier.
def weight(particles, func, image):
for i in xrange(particles.shape[0]):
particles[i][2] = likelihood(particles[i][0], particles[i][1], func, image)
sum_weight = particles[:, 2].sum()
particles[:, 2] *= (particles.shape[0] / sum_weight)
Measure the particles to determine where the good particles are concentrated.
def measure(particles):
x = (particles[:, 0] * particles[:, 2]).sum()
y = (particles[:, 1] * particles[:, 2]).sum()
weight = particles[:, 2].sum()
return x / weight, y / weight
Finally, the processing implemented so far is summarized like a utility function.
During the max_frame
specified by the argument, if no green component is found, the particles are reinitialized.
particle_filter_cur_frame = 0
def particle_filter(particles, func, image, max_frame=10):
global particle_filter_cur_frame
if image[func(image)].size <= 0:
if particle_filter_cur_frame >= max_frame:
return None, -1, -1
particle_filter_cur_frame = min(particle_filter_cur_frame + 1, max_frame)
else:
particle_filter_cur_frame = 0
if particles is None:
particles = init_particles(func, image)
if particles is None:
return None, -1, -1
resample(particles)
predict(particles)
weight(particles, func, image)
x, y = measure(particles)
return particles, x, y
So, let's create a program that tracks green objects using the particle filters implemented so far.
Here, the frame data (BGR image) acquired by cv2.VideoCapture ()
is converted to HSV, and S (Saturation) and V (Value) are multiplied by the OTSU threshold, respectively, and the color depth and brightness are multiplied. In both cases, the process of utilizing only sufficient pixels is executed (the H component is masked with passing pixels of S and V and filled with 0).
I want to find the range of green (50-85), so clearing 0 is reasonable. (On the other hand, when detecting red, it will be strange if you do not fill it with a value other than 0.)
What is actually passed to the particle_filter ()
function is the H component that has been filtered above. After that, I think that the particle_filter ()
function manages the particles and tracks the green part well.
import cv2
import numpy as np
if __name__ == "__main__":
def is_green(region):
return (region >= 50) | (region < 85)
cap = cv2.VideoCapture(0)
particles = None
while cv2.waitKey(30) < 0:
_, frame = cap.read()
frame_hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV_FULL)
frame_h = frame_hsv[:, :, 0]
_, frame_s = cv2.threshold(frame_hsv[:, :, 1], 0, 255, cv2.THRESH_BINARY|cv2.THRESH_OTSU)
_, frame_v = cv2.threshold(frame_hsv[:, :, 2], 0, 255, cv2.THRESH_BINARY|cv2.THRESH_OTSU)
frame_h[(frame_s == 0) | (frame_v == 0)] = 0
particles, x, y = particle_filter(particles, is_green, frame_h)
if particles is not None:
valid_particles = particles[(particles[:, 0] >= 0) & (particles[:, 0] < frame.shape[1]) &
(particles[:, 1] >= 0) & (particles[:, 1] < frame.shape[0])]
for i in xrange(valid_particles.shape[0]):
frame[valid_particles[i][1], valid_particles[i][0]] = [255, 0, 0]
p = np.array([x, y], dtype=np.int32)
cv2.rectangle(frame, tuple(p - 15), tuple(p + 15), (0, 0, 255), thickness=2)
cv2.imshow('green', frame)
cap.release()
cv2.destroyAllWindows()
This particle filter is extremely resistant to noise and movement, and will chase an object fairly persistently.
In addition, the detection position obtained as a result is also very stable.
Once you have implemented the particle filter, the key is how to implement the likelihood ()
function. The life and death of particles depends on this likelihood ()
.
Observe the desperate movement of the particles to survive.
Gradually, the desperation of each particle, which is ruthlessly evaluated by likelihood ()
, becomes interesting. It is a particle filter that accurately follows, but in the shadow of it, countless particles are born and die, and I feel such harsh sorrow.
Recommended Posts