I wrote it for the purpose of fitting a lot of experimental data at once. It's a hassle to open the data one by one, fit it, record the parameters of the result, and so on. Let's take Lawrench Anne as an example, but I think that the flow of file list creation → fitting itself may have other applications.
Lorentian (Lorentz function) is such a function.
I think it is easier to understand if you actually play with the data, so I will create sample data.
dataset.py
import numpy as np
#Parameter definition
intensity = 50 #Strength
HWHM = 5 #Half width half width
a = 10 #Large amount of data variation
#Creating a data file
for X0 in range (-30, 31, 10):
filename = 'X0=' + str(X0) + '.dat'
with open(filename, 'w') as file:
file.writelines('X' + '\t' + 'Y' +'\n')
for X in range (-100,101):
Y = intensity * HWHM**2 / ((X-X0)**2 + HWHM**2) + a * np.random.rand() - 5
file.writelines(str(X) + '\t' + str(Y) + '\n')
When executed, 7 dat files will be created in the directory, separated by 10 from X0 = -30 to 30. Numerical data in two columns (X, Y) separated by tabs. X, Y (character string) is written as an index on the first line. The content is the Lorentz function from earlier, Y = $ f ($ X $) $. Y is given a slight variation with random numbers. As an example, plotting "X0 = 0.0dat" looks like the one below.
From now on, I would like to low-wrench unfit these 7 dat files at once to create a dat file that summarizes the graph of each result and the converged parameters.
The main subject is below. Let's fit the created 7 dat files at once and get only the result. I would like to use jupyter so that the procedure is easy to follow. If you write the general flow from here first,
Will be.
Use the glob module to create a list of files to fit, filelist
, for processing by the iterator. Here we use the wildcard *
to list only the files named X0 = ○○ .dat
in the folder. So, I will do it with jupyter,
In[1]
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ptick
import scipy.optimize
import glob, re
from functools import cmp_to_key
In[2]
filelist = glob.glob('X0=*.dat')
filelist2 = []
for filename in filelist:
match = re.match('X0=(.*).dat', filename)
tuple = (match.group(1), filename)
filelist2.append(tuple)
def cmp(a, b):
if a == b: return 0
return -1 if a < b else 1
def cmptuple(a, b):
return cmp(float(a[0]), float(b[0]))
filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]
filelist
Out[2]
['X0=-30.dat',
'X0=-20.dat',
'X0=-10.dat',
'X0=0.dat',
'X0=10.dat',
'X0=20.dat',
'X0=30.dat']
Reference: Python: Sort by file name number (External site)
To execute. What ʻIn [2]does is create the list and sort the files in the list. If you just catch it with
glob`, it will be sorted by character string, so it will not be sorted in ascending order of numbers. That's not a big problem, but I'm still feeling uncomfortable later, so I'm sorting them in ascending order.
The inside of the loop gets a little longer by all means, but it looks like this. (I will explain later.)
In[3]
result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")
for filename in filelist:
print(filename)
match = re.match('(.*).dat', filename)
name = match.group(1)
with open(filename, 'r') as file:
X, Y = [], []
for line in file.readlines()[1:]:
items = line[:-1].split('\t')
X.append(float(items[0]))
Y.append(float(items[1]))
#Initial value estimation (intensity and center value)
max_Y = max(Y)
min_Y = min(Y)
if np.abs(max_Y) >= np.abs(min_Y):
intensity = max_Y
X0 = X[Y.index(max_Y)]
else:
intensity = min_Y
X0 = X[Y.index(min_Y)]
#Initial value setting
pini = np.array([intensity, X0, 1])
#fitting
def func(x, intensity, X0, HWHM):
return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)
popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
perr = np.sqrt(np.diag(pcov))
#View results
print("initial parameter\toptimized parameter")
for i, v in enumerate(pini):
print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))
#Writing the fitting result to the dat file
result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e) for p, e in zip(popt, perr)]) + '\n')
#Create an array for drawing a fitting curve
fitline = func(X, popt[0], popt[1], popt[2])
#Result plot and save image
plt.clf()
plt.figure(figsize=(10,6))
plt.plot(X, Y, 'o', alpha=0.5)
plt.plot(X, fitline, 'r-', linewidth=2)
plt.xlabel('X')
plt.ylabel('Y')
plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
plt.savefig(name + ".png ")
plt.close()
result.close()
In[3]Result of
X0=-30.dat
initial parameter optimized parameter
52.8746388447 52.1363835425 ± 1.37692592137
-29.0 -30.2263312447 ± 0.132482308868
1.0 5.01691953143 ± 0.187432386079
・ ・ ・
(Omitted)
・ ・ ・
X0=30.dat
initial parameter optimized parameter
50.0825336071 50.5713312616 ± 1.54634448135
31.0 30.1170389209 ± 0.149532401478
1.0 4.89078858649 ± 0.211548017933
I would like to explain each one. First, it is above the commented out # estimate of initial value
, but first we prepare a file to write the final result (create a file and write an index). Then, the iteration is started using the list of data files below the for statement.
python
match = re.match('(.*).dat', filename)
name = match.group(1)
In the place, the character string before the extension of .dat
is extracted with name
for the file name when exporting the image at the end.
Next is the # estimate of initial value
. In this fitting, ** initial parameters ** (pini
) are required to be ** peak intensity ** (ʻintensity), ** X ** (
X0) when taking a peak, ** half width. It is half width ** (
HWHM`). It is difficult to estimate with what accuracy, but here we estimate each as follows.
In the order of estimation,
ʻIntensity: Extracts the maximum and minimum values of Y and adopts the one with the larger absolute value.
X0: Adopt the value of X when the adopted ʻintensity
appears
HWHM
: Not estimated. I set it to 1 for the time being. The top two parameters are pretty accurate, so it looks good to compromise here.
And at the # fitting
. Fit using the estimated initial parameters. Here, fitting is done according to the fitting procedure of SciPy. Converged parameters are stored in popt
. perr
is the standard error. It takes the root of the diagonal component of the covariance pcov
of the fitting parameters.
reference: -Nonlinear function modeling in Python -Scipy.optimize.curve_fit (SciPy official document)
The last is # display result
and # write fitting result to dat file
, # create array for fitting curve drawing
, # plot result and save image
. The # display of result
and # writing of fitting result to dat file
are omitted as they are. #Create array for fitting curve drawing
is the result of substituting X for data into a function defined with converged parameters. Therefore, here, the fitting line data is prepared with the same score as the original data. The # result plot and image save
is also omitted as it is.
python
plt.savefig(name + ".png ")
In this way, I am using name
which is the result of the first re.match
.
The image of the fitting result looks like this.
Then, the numerical result (converged fitting parameters and standard error are tab-delimited) is written to fitresult.dat
.
So, I was able to fit 7 files at once. This time it was seven for simplicity, but if you are doing an experiment, you may have to fit dozens to more than 100 data in some cases, so I think that it will be useful in such cases.
If the peak is so clear, isn't it possible to fix all the initial parameters appropriately? I think some people think that. However, if there is a deviation in the initial parameters at the order level,
The reality is that it will be something like this. I don't think it is necessary to estimate the initial value for linear fitting, but if the function becomes a little complicated, it seems better to estimate the initial value to some extent (since it is enough to estimate the order).
You may not be familiar with Jupyter, so I'll put the one in the form of .py
at the end.
lorentzian_fittng.py
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
import glob, re
from functools import cmp_to_key
filelist = glob.glob('X0=*.dat')
filelist2 = []
'''
/////////////////////////////////
Sorting data files
/////////////////////////////////
'''
for filename in filelist:
match = re.match('X0=(.*).dat', filename)
tuple = (match.group(1), filename)
filelist2.append(tuple)
def cmp(a, b):
if a == b: return 0
return -1 if a < b else 1
def cmptuple(a, b):
return cmp(float(a[0]), float(b[0]))
filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]
'''
/////////////////////////////////
Analysis (low wrench unfitting)
/////////////////////////////////
'''
result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")
for filename in filelist:
print(filename)
match = re.match('(.*).dat', filename)
name = match.group(1)
with open(filename, 'r') as file:
X, Y = [], []
for line in file.readlines()[1:]:
items = line[:-1].split('\t')
X.append(float(items[0]))
Y.append(float(items[1]))
#Initial value estimation (intensity and center value)
max_Y = max(Y)
min_Y = min(Y)
if np.abs(max_Y) >= np.abs(min_Y):
intensity = max_Y
X0 = X[Y.index(max_Y)]
else:
intensity = min_Y
X0 = X[Y.index(min_Y)]
#Initial value setting
pini = np.array([intensity, X0, 1])
#fitting
def func(x, intensity, X0, HWHM):
return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)
popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
perr = np.sqrt(np.diag(pcov))
#View results
print("initial parameter\toptimized parameter")
for i, v in enumerate(pini):
print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))
#Writing the fitting result to the dat file
result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e) for p, e in zip(popt, perr)]) + '\n')
#Create an array for drawing a fitting curve
fitline = []
for v in X:
fitline.append(func(v, popt[0], popt[1], popt[2]))
#Result plot and save image
plt.clf()
plt.figure(figsize=(10,6))
plt.plot(X, Y, 'o', alpha=0.5)
plt.plot(X, fitline, 'r-', linewidth=2)
plt.xlabel('X')
plt.ylabel('Y')
plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
plt.savefig(name + ".png ")
plt.close()
result.close()
Recommended Posts