The trigger was that when I was organizing the programs I wrote in the past, the curve fitting of Bezier curves (Curve fitting % 9A% E3% 81% 82% E3% 81% A6% E3% 81% AF% E3% 82% 81)) came out, and I thought it was nostalgic, but the formula was difficult. I couldn't understand it at all.
Why can I approximate it with such a mysterious formula, I'm amazing in the old days.
Apparently it was approximated using the least squares method, and I'd like to explain it in Python after a review. I pulled out the reference book and re-studied it. By the way, the reference is by Dr. Kenichi Kanatani ["Optimized mathematics that you can understand-from basic principles to calculation methods"](https://www.amazon.co.jp/ % E3% 81% 93% E3% 82% 8C% E3% 81% AA% E3% 82% 89% E5% 88% 86% E3% 81% 8B% E3% 82% 8B% E5% BF% 9C% E7 % 94% A8% E6% 95% B0% E5% AD% A6% E6% 95% 99% E5% AE% A4% E2% 80% 95% E6% 9C% 80% E5% B0% 8F% E4% BA % 8C% E4% B9% 97% E6% B3% 95% E3% 81% 8B% E3% 82% 89% E3% 82% A6% E3% 82% A7% E3% 83% BC% E3% 83% 96 % E3% 83% AC% E3% 83% 83% E3% 83% 88% E3% 81% BE% E3% 81% A7-% E9% 87% 91% E8% B0% B7-% E5% 81% A5 % E4% B8% 80 / dp / 4320017382). I recommend this book because it is very easy to understand.
Least Squares Is a typical technique for approximating complex data and functions and is the most important basis for data analysis. The range of application is very wide, and it is a method that has both practicality and beauty that any function can be approximated if it is differentiable.
For example, let's say you have 20 points here.
Let's put a straight line that passes the closest to the 20 points.
For the straight line function $ f (x) = ax + b $, find the parameters $ a $, $ b $ that pass the closest to all points. The typical method for finding $ a $ and $ b $ is the ** least squares method **.
It is called the least squares method because it approximates the specified location and the plotted location so that the sum of the squares of the ** errors is minimized **.
First, let's try the simplest straight line approximation, and draw a straight line that approximates the following four points.
points = [(107, 101), (449, 617), (816, 876), (1105,1153)]
The straight line function is $ f (x) = ax + b $, $ a $ is the slope and $ b $ is the intercept. Find the parameters $ a $ and $ b $ that minimize the square of the error at the specified point for the function $ f (x) $.
Given the specified location $ y $, the function $ j (x, y) $ that finds the square of the error is:
j(x, y) = (y - (ax + b))^2
Since we only need to minimize the sum of $ j $ for N points, the least squares formula is:
J = \frac {1}{2}\sum_{k=1}^{n} j(x_k, y_k)
Suddenly the esoteric word ** partial derivative ** came out, but don't worry about anything. Even elementary school students can calculate as long as they know the calculation method.
Differentiating a function with two or more variables with respect to one variable is called ** partial differential **. And the function obtained by partial differentiation is called ** partial derivative **.
The derivative is $ (X ^ n)'= nX ^ {n-1} $, and the exponent is n as a coefficient and the exponent is n-1.
Example:
Differentiation is the slope of the tangent of the graph, which can be rephrased as finding the instantaneous velocity of the graph. A point with a derivative of 0 is an inflection point in the case of a convex graph.
If you understand the derivative, there is nothing new about the partial derivative, but it's okay if you don't understand the difficult things. We have Sympy (and Jupyter)!
If you don't know Sympy, please read Create the strongest calculator environment with Sympy + Jupyter.
Let's start Jupyter right away.
from sympy import *
init_session()
This completes the Sympy and Jupyter setup. Define a function $ j $ to find the error.
a, b = symbols("a b")
j = (y - (a * x + b)) ** 2
You can find the partial derivative with the sympy.diff
function, first with respect to $ a $.
Write the partial derivative for $ a $ as $ \ frac {\ partial j} {\ partial a} $ and for $ b $ as $ \ frac {\ partial j} {\ partial b} $.
j_a = diff(j, a)
j_b = diff(j, b)
\frac{\partial j}{\partial a}
=
- 2 x \left(- a x - b + y\right) \\
\frac{\partial j}{\partial b}
=
2 a x + 2 b - 2 y
I was able to find the partial derivative of the linear function, and with Sympy
, it would be an easy win.
The least squares method is 1/2 of the total error, so let's calculate it.
Assigning a value to an expression is done with the sympy.subs
function.
sum_a = sum([j_a.subs([(x, _x), (y, _y)]) for _x, _y in points]) / 2.
sum_b = sum([j_b.subs([(x, _x), (y, _y)]) for _x, _y in points]) / 2.
\frac {1}{2}\sum_{k=1}^{n} \frac{\partial j}{\partial a}
=
2099931.0 a + 2477.0 b - 2276721.0 \\
\frac {1}{2}\sum_{k=1}^{n} \frac{\partial j}{\partial b}
=
2477.0 a + 4 b - 2747.0
You've got half the equations for each sum, and all you have to do is solve the simultaneous equations of these two equations.
\begin{bmatrix}
2099931.0 & 2477.0 \\
2477.0 & 4
\end{bmatrix}
\begin{bmatrix}
a \\
b
\end{bmatrix}
=
\begin{bmatrix}
2276721.0 \\
2747.0
\end{bmatrix}
solve([sum_a, sum_b], [a, b])
# {a:1.01694642025091,b:57.0059292596265}
The answer is ʻa = 1.01694642025091,
b = 57.0059292596265`. All you have to do is substitute it into the straight line equation $ y = ax + b $.
Let's plot it right away.
Since it is a straight line, it cannot pass through all the points, but it passes through a place close to it.
We approximated a simple straight line earlier, but the Bezier curve of the cubic curve can be approximated by exactly the same procedure.
In the Bezier curve, $ p_1 $ is the start point, $ p_4 $ is the end point, and $ p_2 $ and $ p_3 $ are the control points, respectively.
The formula for the cubic Bezier curve is:
bz = p_{1} \left(- t + 1\right)^{3} + p_{2} t \left(- 3 t + 3\right) \left(- t + 1\right) + p_{3} t^{2} \left(- 3 t + 3\right) + p_{4} t^{3}
The range of t is 0 ≤ t ≤ 1.
Assuming that the Bezier curve formula is $ bz $ and the point you want to approximate is $ p_x $, the error calculation function $ j $ is as follows.
j = (p_x - bz)^{2}
The definition of the least squares method was as follows.
J = \frac {1}{2}\sum_{k=1}^{n} j(x_k, y_k)
The parameters we want to find are $ p_2 $ and $ p_3 $, so we will partially differentiate with each parameter. There is 1/2 of the definition of the least squares method, so let's multiply the partial derivative by 1/2 in advance.
Let's write it in Sympy
.
p0, p1, p2, p3, p4, px, t = symbols("p0 p1 p2 p3 p4 px t")
bz = (1-t)**3*p1 + 3*(1-t)**2*t*p2 + 3*(1-t)*t**2*p3 + t**3*p4
j = (px - bz) ** 2
jp2 = simplify(diff(j, p2) / 2)
jp3 = simplify(diff(j, p3) / 2)
1/2 of the partial derivatives of $ p_2 $ and $ p_3 $ are as follows, respectively.
\frac{1}{2} \frac{\partial j}{\partial p_2}
=
3 t \left(t - 1\right)^{2} \left(- p_{1} \left(t - 1\right)^{3} + 3 p_{2} t \left(t - 1\right)^{2} - 3 p_{3} t^{2} \left(t - 1\right) + p_{4} t^{3} - px\right)
\\
\frac{1}{2} \frac{\partial j}{\partial p_3}
=
3 t^{2} \left(t - 1\right) \left(p_{1} \left(t - 1\right)^{3} - 3 p_{2} t \left(t - 1\right)^{2} + 3 p_{3} t^{2} \left(t - 1\right) - p_{4} t^{3} + px\right)
Define the pair value of the point $ p_x $ you want to pass and $ t $ at that time. As an example, suppose you have the following four points.
points = [(0, 0), (0.2, 65), (0.7, 45), (1.0, 100)]
The start and end points are $ p_1 $ and $ p_4 $, so these two are known values. For $ p_2 $ and $ p_3 $ you want to find, solve the sum of the partial derivatives with a sequence equation.
const = ((p1, points[0][1]), (p4, points[-1][1]))
tp2 = sum([jp2.subs(const + ((t, x[0]), (px, x[1]))) for x in points[1:-1]])
tp3 = sum([jp3.subs(const + ((t, x[0]), (px, x[1]))) for x in points[1:-1]])
solve([tp2, tp3], [p2, p3])
# {p2:180.456349206349,p3:−53.0753968253968}
The answer is p2 = 180.456349206349, p3 = −53.0753968253968
.
You can approximate it properly!
Let's look at various parameters.
points = [(0, 0), (0.3, 50), (0.7, 80), (1.0, 100)]
points = [(0, 0), (0.2, 34), (0.4, 44), (0.6, 46), (0.8, 60), (1.0, 100)]
Unless it is a very extreme value, it seems that it can pass properly. Of course, there are some values that cannot be passed, but somehow it draws a line like that.
Finally, the cells to be plotted are described by approximating the Bezier curve with the least squares method on Jupyter.
%matplotlib inline
import matplotlib.pyplot as plt
p0, p1, p2, p3, p4, px, t = symbols("p0 p1 p2 p3 p4 px t")
bz = (1-t)**3*p1 + 3*(1-t)**2*t*p2 + 3*(1-t)*t**2*p3 + t**3*p4
j = (px - bz) ** 2
jp2 = simplify(diff(j, p2) / 2)
jp3 = simplify(diff(j, p3) / 2)
def plot(points):
const = ((p1, points[0][1]), (p4, points[-1][1]))
tp2 = sum([jp2.subs(const + ((t, x[0]), (px, x[1]))) for x in points[1:-1]])
tp3 = sum([jp3.subs(const + ((t, x[0]), (px, x[1]))) for x in points[1:-1]])
vec = solve([tp2, tp3], [p2, p3])
const = {p1: points[0][1], p2: vec[p2], p3: vec[p3], p4: points[-1][1]}
bz2 = bz.subs(const)
x = np.arange(0, 1+0.01, 0.01)
y = [
bz2.subs(t, _t)
for _t in x
]
plt.plot(x, y)
plt.scatter([p[0] for p in points], [p[1] for p in points])
If you enter a list of points you want the plot
function to pass through, it will do its best to pass through those points using the least squares method.
The usage is called as follows.
points = [(0, 0), (0.1, -40), (0.5, -50), (0.8, 30), (1.0, 100)]
plot(points)
Write an implementation in an environment without Sympy such as C / C ++.
\frac{1}{2} \frac{\partial j}{\partial p_2}
=
3 t \left(t - 1\right)^{2} \left(- p_{1} \left(t - 1\right)^{3} + 3 p_{2} t \left(t - 1\right)^{2} - 3 p_{3} t^{2} \left(t - 1\right) + p_{4} t^{3} - px\right)
\\
\frac{1}{2} \frac{\partial j}{\partial p_3}
=
3 t^{2} \left(t - 1\right) \left(p_{1} \left(t - 1\right)^{3} - 3 p_{2} t \left(t - 1\right)^{2} + 3 p_{3} t^{2} \left(t - 1\right) - p_{4} t^{3} + px\right)
However, only the above calculated two formulas are used, and the unclear points $ p_2 $ and $ p_3 $ are set as 1.0, and each term is calculated. I will write it as C / C ++ as possible.
def lsm(points):
'''Returns a matrix solution of the least squares method'''
p1, p4 = points[0][1], points[-1][1]
a = b = c = d = e = f = 0.
for _t, _px in points[1:-1]:
_t_1_d = (_t - 1.) ** 2
_t_1 = (_t - 1.)
_p1 = p1 * _t_1
_p2 = 3 * _t * _t_1_d
_p3 = 3 * _t ** 2 * _t_1
_p4 = p4 * _t ** 3
_j_p2 = 3 * _t * _t_1_d
_j_p3 = 3 * _t ** 2 * _t_1
a += _p2 * _j_p2
b += -_p3 * _j_p2
c += -_p2 * _j_p3
d += _p3 * _j_p3
e += -((-_p1 + _p4 - _px) * _j_p2)
f += -((_p1 - _p4 + _px) * _j_p3)
return [
[a, b],
[c, d]
], [e, f]
def inv(mat, vec):
'''Solve simultaneous equations with inverse matrix'''
dm = mat[0][0] * mat[1][1] - mat[1][0] * mat[0][1]
assert (dm != 0.0)
a = mat[1][1] / dm
b = -mat[0][1] / dm
c = -mat[1][0] / dm
d = mat[0][0] / dm
return (vec[0] * a + vec[1] * b,
vec[0] * c + vec[1] * d)
points = [(0, 0), (0.2, 65), (0.7, 45), (1.0, 100)]
inv(*lsm(points))
The least squares method is a very powerful method that can be approximated if it is a differentiable function.
The Bezier curve approximation seems to be fun when applied to game programs.
If the partial derivative is calculated in advance with Sympy
, it can be calculated by simply implementing an algorithm that solves simultaneous equations in other languages such as C / C ++.
(There are many algorithms for simultaneous equations, such as "Gauss-Jordan sweeping method")
After all, Sympy
is a very powerful tool, and it easily optimizes / calculates formulas with many variables.
I am not good at calculation, so it is very useful.
Recommended Posts