When you want to measure the temperature etc. with an embedded device, there are cases where you cannot actually measure at the desired position and you have to calculate and approximate it somehow. then,
――Can you make a good approximation formula? ――How to decide the parameters of the calculation formula
That is a problem.
In this article, I will show you an example of how to determine the latter parameter easily by using scipy
in Python.
――Finally, it is supposed to be applied to embedded devices that can only perform poor integer arithmetic. ――The system you want to approximate is treated as a black box to some extent.
Below, all execution is done on Jupyter Notebook. It is published on github.
Assuming the data below is ready as a CSV file, start by reading this data.
--t time --x Measured values such as sensors actually obtained --y The value you actually want to get, you want to approximate this value from x
The problem setting is "y cannot be measured in actual operation, so it is approximated from the value of x". (Although y can be measured by a special method at the development site, it often happens that only x can be obtained at the stage of operation.)
Corresponds to machine learning teacher data y and input data x.
%matplotlib inline
import matplotlib.pyplot as plt
import pandas
from scipy import optimize
import numpy as np
#read csv
datafile = 'xy_data.csv' #Data file name
df = pandas.read_csv(datafile)
#Let's visualize
df.plot()
plt.show()
#I will also display the numerical value of the data(100 to 5)
df[100:105]
x(オレンジ)を使ってy(緑)を近似したい。
Now let's use x to approximate y.
y can be seen as 1.? Times of x. Therefore, we decide to approximate with f (x) = a * x + b
and find the parameters a and b with scipy
.
#Define an approximate expression
def f1(x, a, b): #List the required parameters after one input
return a*x + b
#Organize approximate calculations into functions
def param_solver(f, x, y, debug_print=True):
'''Function f(x)Calculate the parameters adjusted with the initial value guess so that the input x matches y.'''
params, params_covariance = optimize.curve_fit(f, x, y) # <----------Optimization calculation here
if debug_print:
print('params =', params) #Obtained parameters
print('params_covariance =', params_covariance) #Covariance with this parameter
print('standad deviation errors = ', np.sqrt(np.diag(params_covariance))) #standard deviation
return params
#Convert data to numpy data once
x = np.array(df['x'].tolist())
y = np.array(df['y'].tolist())
t = np.array(df['t'].tolist())
#Calculate parameters
params = param_solver(f=f1, x=x, y=y)
params = [ 1.24075239e+00 2.31148413e+03] params_covariance = [[ 3.47128802e-04 -4.06799576e+00] [ -4.06799576e+00 5.46259540e+04]] standad deviation errors = [ 1.86313929e-02 2.33721959e+02]
You can easily find approximate parameters in *** ʻoptimize.curve_fit` *** in scipy.
How close could it be?
#Combine visualizations into functions
def plot_all(f, x, y, params):
fig, ax = plt.subplots()
ax.plot(t, f(x, *params), 'b-', label="f(x)")
ax.plot(t, x if len(np.array(x).shape) == 1 else x[0], 'r-', label="x")
ax.plot(t, y, 'g-', label="y")
ax.legend()
plt.show()
plot_all(f1, x, y, params)
It was impossible because I just multiplied x (red) by a constant for y (green) ... Let's change the model.
Approximate with f (x) = a * x ^ 2 + b * x + c
.
#Define an approximate expression
def f2(x, a, b, c):
return a*x**2 + b*x + c
#Calculate parameters
params2 = param_solver(f=f2, x=x, y=y) #Make guess three elements
#Visualize
plot_all(f2, x, y, params2)
params = [ -5.82931117e-05 2.42030720e+00 -2.33839533e+03] params_covariance = [[ 1.78832431e-11 -3.61865487e-07 1.42649657e-03] [ -3.61865487e-07 7.56116875e-03 -3.16641976e+01] [ 1.42649657e-03 -3.16641976e+01 1.51375869e+05]] standad deviation errors = [ 4.22885837e-06 8.69549812e-02 3.89070519e+02]
It's pretty similar, but the peaks aren't very close. If you look closely, it seems that the points of the curve are different in time between x and y. Consider the time factor as well.
Approximate with f (x) = a * x ^ 2 + b * x + c * dx / dt + d
using the derivative dx / dt
.
def f3(xs, a, b, c, d):
x, xdelayed = xs
dx = x - xdelayed #Derivative is expressed by the difference from the delayed component
return a*x**2 + b*x + c*dx + d;
def make_delayed(d, delay_step):
'''First delay_step pieces d[0]And the rear end delay_Create d with step truncated'''
d = list(d)
n = len(d)
result = np.array([d[0] for i in range(delay_step)] + list(d[:-delay_step]))
return result
#Create data 10 hours behind
x10 = make_delayed(x, delay_step=10)
#Calculate and visualize parameters
params3 = param_solver(f=f3, x=(x, x10), y=y) #Giving x two inputs
plot_all(f3, (x, x10), y, params3)
params = [ -3.54499345e-05 2.03961022e+00 3.95514919e+00 -2.84616185e+03] params_covariance = [[ 2.27985415e-12 -4.55332708e-08 2.90736635e-08 1.64730881e-04] [ -4.55332708e-08 9.39580914e-04 -4.84532248e-04 -3.67720697e+00] [ 2.90736635e-08 -4.84532248e-04 5.03391759e-03 -6.46259873e-01] [ 1.64730881e-04 -3.67720697e+00 -6.46259873e-01 1.79598365e+04]] standad deviation errors = [ 1.50991859e-06 3.06525841e-02 7.09501063e-02 1.34014314e+02]
Quite similar! However, the approximated f (x) is slightly off from y (green) in the second half.
Here, the simple method of differentiating is x (t) --x (t-10)
, but was this 10 time delay good?
Let's search for this delay component (hereinafter td) as a parameter.
#For the time-delayed td, the evaluation function that quantifies the goodness, the goal and f(x)Absolute value of the difference(L1 distance)Defined as the sum of
def cost(td, *args):
_f, _x, _y = args
#Create delay data
xdelay = make_delayed(d=_x, delay_step=td)
#Calculate optimization
_params = param_solver(f=_f, x=(_x, xdelay), y=_y, debug_print=False)
#Calculate the sum of distances
return np.sum(np.abs(_y - _f((_x, xdelay), *_params)))
print('cost(td=5) = ', cost(5, f3, x, y))
print('cost(td=10) = ', cost(10, f3, x, y))
cost(td=5) = 178246.4667 cost(td=10) = 149393.276741
First, I decided the evaluation function easily, and when I took the value with td = 5, 10 as a trial, I was able to quantitatively confirm that 10 is better than 5.
Again, optimize with scipy.
def cost_for_brute(td, *args):
'''behavior of brute(Finally td is given in the list)Wrapper according to, always give one integer value to cost'''
try:
td = int(td[0])
except:
pass
return cost(td, *args)
#search
result = optimize.brute(cost_for_brute, [slice(1, 50, 1)], args=(f3, x, y), full_output=True)
final_td, final_cost, tds, costs = result #Optimal td,Optimal cost,Search range td,Each cost
final_td = int(final_td[0]) # [23.]Since it is returned as a list of real numbers like, it is converted to an integer.
print('final optimum td = ', final_td, 'for cost = ', final_cost)
#Visualization
fig, ax = plt.subplots()
ax.plot(tds, costs, 'b-', label="cost(td)")
ax.plot(final_td, final_cost, 'go', label="Minima")
ax.legend()
plt.show()
final optimum td = 23 for cost = 123800.706755
You can see that there is only one optimal solution that is nicely convex and has the smallest evaluation function value. Now, let's visualize the approximation with this parameter td
.
def solve_and_plot_by_td(td, f, x, y, debug_print=True):
#Create delay data
xdelay = make_delayed(d=x, delay_step=td)
#Calculate optimization
params = param_solver(f=f, x=(x, xdelay), y=y, debug_print=debug_print)
#Visualization
plot_all(f, (x, xdelay), y, params)
return params
f3_params = solve_and_plot_by_td(final_td, f3, x, y)
params = [ -3.56938479e-05 2.02134829e+00 1.78433928e+00 -2.62914982e+03] params_covariance = [[ 1.41335136e-12 -2.83473596e-08 7.69752574e-09 1.03708164e-04] [ -2.83473596e-08 5.86738635e-04 -1.35889230e-04 -2.30772756e+00] [ 7.69752574e-09 -1.35889230e-04 6.07763010e-04 -9.90337036e-02] [ 1.03708164e-04 -2.30772756e+00 -9.90337036e-02 1.11544635e+04]] standad deviation errors = [ 1.18884455e-06 2.42226884e-02 2.46528499e-02 1.05614693e+02]
It has improved. The value of the evaluation function is specifically optimized as follows, and f (x) almost overlaps with the target y on the graph.
cost(td=10) = 149393.276741
cost(td=23) = 123800.706755
If the goal was achieved with this, was the approximate expression f3 () really optimal?
For example, was the quadratic term ʻa * x ** 2` necessary? Also, although there is differentiation, isn't the integration term necessary?
Approximate with f (x) = b * x + c * dx / dt + d
, omitting x ^ 2
.
def f4(xs, b, c, d):
x, xdelayed = xs
dx = x - xdelayed #Derivative is expressed by the difference from the delayed component
return b*x + c*dx + d;
#Search implementation
result = optimize.brute(cost_for_brute, [slice(1, 50, 1)], args=(f4, x, y), full_output=True)
final_td, final_cost, tds, costs = result #Optimal td,Optimal cost,Search range td,Each cost
final_td = int(final_td[0]) # [23.]Since it is returned as a list of real numbers like, it is converted to an integer.
print('final optimum td = ', final_td, 'for cost = ', final_cost)
#Visualization
fig, ax = plt.subplots()
ax.plot(tds, costs, 'b-', label="cost(td)")
ax.plot(final_td, final_cost, 'go', label="Minima")
ax.legend()
plt.show()
final optimum td = 32 for cost = 251326.900162
I found that the cost was bad. Just in case, let's visualize the state of approximation.
solve_and_plot_by_td(final_td, f4, x, y)
params = [ 1.28868602 1.44297502 176.91492786] params_covariance = [[ 6.30546320e-05 3.59497597e-05 -7.78121379e-01] [ 3.59497597e-05 1.08222102e-03 -1.60091104e+00] [ -7.78121379e-01 -1.60091104e+00 1.21028825e+04]] standad deviation errors = [ 7.94069468e-03 3.28971279e-02 1.10013101e+02]
At first glance, you can see that it has deteriorated in the graph. As mentioned above, I understand that the quadratic term was necessary.
Next, let's introduce the integral term.
Approximate with f (x) = a * x ^ 2 + b * x + c * dx / dt + d * sum (x) + e
using the integral termsum (x)
.
#Obtain the integral value in the moving interval. Divide by the number of data to be added to prevent digit overflow.=>∴ Moving average ...
def integral(d, span):
d = list(d)
n = len(d)
dspan = [d[0] for i in range(span - 1)] + d #Span at the beginning-Prepare an array with one added
return np.array([sum(dspan[i:i+span])/span for i in range(n)])#Create data by adding spans while moving
#Define an approximate expression
def f5(xs, a, b, c, d, e):
x, xdelay, xsum = xs
dx = x - xdelay
return a*x**2 + b*x + c*dx + d*xsum + e
#An evaluation function that quantifies the goodness for the integration time ti, the goal and f(x)Absolute value of the difference(L1 distance)Defined as the sum of
def cost5(ti, *args):
f, x, xdelay, y = args
#Create integral data
xsum = integral(x, ti)
#Calculate optimization
params = param_solver(f=f, x=(x, xdelay, xsum), y=y, debug_print=False)
#Calculate the sum of distances
return np.sum(np.abs(y - f((x, xdelay, xsum), *params)))
#Create delay data(* Use the optimum value obtained last time.)
xdelay = make_delayed(d=x, delay_step=final_td)
#Try ti=Calculate the cost when 10
print(cost5(5, f5, x, xdelay, y))
print(cost5(10, f5, x, xdelay, y))
105806.884719 105436.131801
def cost5_for_brute(td, *args):
'''behavior of brute(Finally td is given in the list)Wrapper according to, always give one integer value to cost'''
try:
td = int(td[0])
except:
pass
return cost5(td, *args)
#Search implementation
result = optimize.brute(cost5_for_brute, [slice(1, 400, 1)], args=(f5, x, xdelay, y), full_output=True)
final_ti, final_cost, tis, costs = result #Optimal ti,Optimal cost,Search range ti,Each cost
final_ti = int(final_ti[0]) # [23.]Since it is returned as a list of real numbers like, it is converted to an integer.
print('final optimum ti = ', final_ti, 'for cost = ', final_cost)
#Visualization
fig, ax = plt.subplots()
ax.plot(tis, costs, 'b-', label="cost5(ti)")
ax.plot(final_ti, final_cost, 'go', label="Minima")
ax.legend()
plt.show()
final optimum ti = 47 for cost = 89564.819536
It can be seen that the merit function this time has three local solutions in the range of [1,400). It was shown to be the optimal solution 47.
def solve_and_plot_by_ti(ti, f, x, xdelay, y, debug_print=True):
#Create integral data
xsum = integral(x, ti)
#Calculate optimization
params = param_solver(f=f, x=(x, xdelay, xsum), y=y, debug_print=debug_print)
#Visualization
plot_all(f, (x, xdelay, xsum), y, params)
return params
f5_params = solve_and_plot_by_ti(final_ti, f5, x, xdelay, y)
params = [ -3.03782209e-05 9.91815249e+00 -4.26387796e+00 -7.99927214e+00 -2.47884462e+03] params_covariance = [[ 6.63718559e-13 3.34745759e-08 -2.98614480e-08 -4.66996212e-08 4.64742006e-05] [ 3.34745759e-08 7.18603686e-02 -5.03435446e-02 -7.23475214e-02 -1.40511200e+00] [ -2.98614480e-08 -5.03435446e-02 3.54804571e-02 5.08234965e-02 2.46752985e-01] [ -4.66996212e-08 -7.23475214e-02 5.08234965e-02 7.31066200e-02 3.71334580e-01] [ 4.64742006e-05 -1.40511200e+00 2.46752985e-01 3.71334580e-01 4.98091129e+03]] standad deviation errors = [ 8.14689241e-07 2.68067843e-01 1.88362568e-01 2.70382359e-01 7.05755715e+01]
The sum of the L1 distances returned by the merit function still seems to be as large as 89564, but the graph has reached the point where the approximation formula almost matches the target. Harvesting improves the approximation delay by introducing an integral term.
Optimal approximation formula so far
def f5(xs, a, b, c, d, e):
x, xdelay, xsum = xs
dx = x - xdelay
return a*x**2 + b*x + c*dx + d*xsum + e
On the other hand, it was found that each parameter should use the following values.
params = [ -3.03782209e-05 9.91815249e+00 -4.26387796e+00 -7.99927214e+00 -2.47884462e+03]
Finally, let's consider two more.
--Did you really need the x ^ 2 term ?? --When applied to embedded devices, there are cases where 50 moving average buffers cannot be held. What to do in this case?
Looking at the value of the parameter a = params [0], it is -3.03782209e-05
.
In other words, the *** quadratic term contributes little to the result and is not necessary ?? ***. Let's actually change the formula and check it.
#Define an approximate expression
def f6(xs, b, c, d, e):
x, xdelay, xsum = xs
dx = x - xdelay
return b*x + c*dx + d*xsum + e
#search
result = optimize.brute(cost5_for_brute, [slice(1, 400, 1)], args=(f6, x, xdelay, y), full_output=True)
final_ti, final_cost, tis, costs = result #Optimal ti,Optimal cost,Search range ti,Each cost
final_ti = int(final_ti[0]) # [23.]Since it is returned as a list of real numbers like, it is converted to an integer.
print('final optimum ti = ', final_ti, 'for cost = ', final_cost)
#Visualization
fig, ax = plt.subplots()
ax.plot(tis, costs, 'b-', label="cost(ti)")
ax.plot(final_ti, final_cost, 'go', label="Minima")
ax.legend()
plt.show()
solve_and_plot_by_ti(final_ti, f6, x, xdelay, y)
final optimum ti = 339 for cost = 152778.51452
params = [ 1.45958277 1.24676966 -0.27112565 246.80198745] params_covariance = [[ 1.74462598e-04 -1.29255162e-04 -2.11185881e-04 -4.55806349e-01] [ -1.29255162e-04 8.85113218e-04 2.42461052e-04 -1.11227411e+00] [ -2.11185881e-04 2.42461052e-04 3.35043878e-04 -8.63631214e-02] [ -4.55806349e-01 -1.11227411e+00 -8.63631214e-02 7.95856480e+03]] standad deviation errors = [ 1.32084291e-02 2.97508524e-02 1.83042038e-02 8.92107886e+01]
*** After all, the quadratic term was necessary. *** ***
If the value of the parameter is small, you may think that the contribution is small, but it is clear that you cannot make a judgment until you actually check it.
Finally, look at what happens when ti is small for small embedded systems that cannot take large values for ti (cannot take large integration buffers, cannot handle large integers, are difficult to calculate with multiple bits). to watch.
There are as many as 10 local solutions for ti. What about this case?
#Search implementation
result = optimize.brute(cost5_for_brute, [slice(1, 30, 1)], args=(f5, x, xdelay, y), full_output=True)
final_ti, final_cost, tis, costs = result #Optimal ti,Optimal cost,Search range ti,Each cost
final_ti = int(final_ti[0]) # [23.]Since it is returned as a list of real numbers like, it is converted to an integer.
print('final optimum ti = ', final_ti, 'for cost = ', final_cost)
#Visualization
fig, ax = plt.subplots()
ax.plot(tis, costs, 'b-', label="cost(ti)")
ax.plot(final_ti, final_cost, 'go', label="Minima")
ax.legend()
plt.show()
solve_and_plot_by_ti(final_ti, f5, x, xdelay, y)
final optimum ti = 9 for cost = 105290.231346
params = [ -3.35077051e-05 6.34533112e+00 8.26308638e-01 -4.35919543e+00 -2.65976841e+03] params_covariance = [[ 9.91381369e-13 8.29449613e-10 1.80437951e-09 -2.06410309e-08 7.13151940e-05] [ 8.29449613e-10 4.55158040e-02 -4.96419025e-03 -4.52708548e-02 -3.90835831e+00] [ 1.80437951e-09 -4.96419025e-03 7.59625712e-04 4.90797861e-03 2.31789824e-01] [ -2.06410309e-08 -4.52708548e-02 4.90797861e-03 4.54355845e-02 2.30925244e+00] [ 7.13151940e-05 -3.90835831e+00 2.31789824e-01 2.30925244e+00 7.83076217e+03]] standad deviation errors = [ 9.95681359e-07 2.13344332e-01 2.75613082e-02 2.13156244e-01 8.84915938e+01]
It is not so inferior to the optimal solution when ti = 47
.
When implementation is difficult, operating with ti = 9
is often a compromise.
After all, I was able to approximate it well with PID (proportional / integral / derivative) + quadratic term.
Since the original target of this time was a subject close to the classical control system, I think that it is an expected answer for those in that direction. When you can think of such an approximate expression, if you use scipy well in Python, you can easily check and find a suitable parameter.
Also, at that time, I think that it is the fastest to leave a technical memo together with data and calculation formulas using Jupyter Notebook.
This time, the data was explained in one sample. This may be fine if the system does not fluctuate so much, but in reality it is safer to proceed statistically by taking 30 ~ samples.
At that time, there is still work to be done, such as whether to average at the sample stage or to calculate and integrate the parameters.
Also, even after modeling once, the characteristics of the system may change (drift). This is also an issue of how to adapt at that time and whether it can be realized on a small program to adapt the parameters.
Apart from that, the approach of considering the whole as a stochastic system may be better. I would like to make a new probabilistic model and find it using MCMC etc. as the next issue to be examined.
――This time, I was busy a few years ago, so I had a colleague find an approximate expression and solve it. ――However, although an approximate expression was created, another colleague was looking for parameters by hand engineering for operation (a dilemma that I could not take time to tackle by myself ...). I was able to approximate it as it was, and it was left as it was because it was not particularly critical. ――Actually, the approximate expression used only the differential term (equivalent to Model 3 this time). I wasn't skeptical, but I was very skeptical about whether it was the best. ――It was this activity that I tried to deepen to some extent including the reason why I did not use integration.
As a result of persistent investigation this time, it seems that the following can be utilized.
--It is now possible to automate parameter optimization. --I found at least one approximation that can be implemented in a better, smaller system.
Recommended Posts