Hello. Today, I would like to send you a story about informatics that I am doing at graduate school. The other day, I suddenly thought, "By the way, I had never studied properly," so I studied and implemented RANSAC.
When taking natural data such as images in graduate school research, the data may contain "outliers" that appear to deviate significantly from the law due to noise or the like. Outliers interfere with finding the law in the data. In such a case, RANSAC is a method of estimating the law (parameter) by ignoring outliers.
... it's hard to understand when talking about concepts, so let's look at a concrete example. Hereafter, the law is read as "model".
Consider the problem of estimating from a given point cloud what straight line the point cloud is along. The point cloud is distributed along this straight line, but 100 samples are mixed with outliers that jump up one level.
The graph looks like this.
(Generated code by python)
# true values
_a = 0.5
_b = 0.3
# samples
# samples
points = np.array([[x, _a * x + _b + .1 * np.random.randn() + (np.random.randint(100) == 0) * np.random.rand() * 1000] for x in np.arange(0, 10, 0.01)])
Since the straight line is represented by y = ax + b, this problem is synonymous with finding the values of a and b. This time, let the true value of a be 0.5 and the true value of b be 0.3.
If you try to find this (a, b) normally, it will be pulled by the "outliers" and the estimated straight line will be pulled up greatly. For example, the most orthodox method of least squares would look like this.
Red is the true straight line and green is the estimated straight line. The estimated parameters are (a, b) = (0.20, 7.4). In this way, the outliers pull the estimated green straight line up more than the true red straight line.
(Generated code by python, with an annealing approach)
def leastSquare(data):
# Simulated Annealing
tau = 100
bestfit = None
besterr = float('inf')
model = np.zeros(2)
while tau >= 0.0001:
for _ in range(10):
grad = errorGrad(model, data)
grad /= np.linalg.norm(grad)
grad *= -1
model += grad * tau
tau *= 0.1
return model
a, b = leastSquare(points)
If you ask for it normally like this, the "outliers" will interfere and it will not work, so ignore the outliers. RANSAC uses the following algorithm.
If there are a lot of outliers in step 3.
, ignore them because the sample in step 2.
has caught the outliers. By doing this, the influence of outliers can be nicely excluded and the true value can be obtained.
In other words, it is the act of finding the point cloud that best represents the true model among the point clouds of the data, and finding a straight line from that point cloud.
The straight lines obtained in this RANSAC are as follows.
Green is the estimated straight line. You can see that it is close to the distribution of the data. You can see that (a, b) = (0.496228 ..., 0.299107 ...) is fairly close to the true value.
The code in python looks like this.
def ransac(data,
# parameters for RANSAC
n = 2, # required sample num to decide parameter
k = 100, # max loop num
t = 2.0, # threshold error val for inlier
d = 800 # requrired inlier sample num to be correnct param
):
good_models = []
good_model_errors = []
iterations = 0
while iterations < k:
sample = data[np.random.choice(len(data), 2, False)]
param = getParamWithSamples(sample)
inliers = []
for p in data:
if (p == sample).all(1).any(): continue
if getError(param, p) > t:
continue
else:
inliers.append(p)
if len(inliers) > d:
current_error = getModelError(param, data)
good_models.append(param)
good_model_errors.append(current_error)
iterations += 1
best_index = np.argmin(good_model_errors)
return good_models[best_index]
a, b = ransac(data)
This time, I took up the RANSAC algorithm using a two-parameter linear model as an example. It is a very important algorithm because it appears in various papers.
Originally I liked a lot of algorithms, but for a little over a year after I entered graduate school, I was dealing with various algorithms as a black box with an emphasis on practice such as implementation, so from now on I will occasionally touch the contents of each algorithm. I also want to chase after him.
By the way, this code is summarized here in the format of ipython notebook.
https://github.com/smurakami/study-ransac/blob/master/ransac.ipynb