One day meeting, Kyo "T-kun, can't you fit this graph a little better?" "Can't we change the fit range or weight it?" T "... I'll try"
He wants to modify the fitting. It is often required to change the analysis procedure on a whim, and it is difficult to deal with it. I want to be able to do it on the spot.
This time, instead of scipy.optimize.curve_fit, I will introduce a new library called lmfit. Fitting using scipy was performed in Analysis of measurement data ①-Memorandum of understanding for scipy fitting-. Coding is done on Jupyter & Jupyter recommended.
I hope I can introduce lmfit well.
GithHub also has a notebook and sample data. From here
lmfit is developed as upward compatible with scipy.optimize. You can constrain parameters and weight errors when fitting. It seems that it works like adding various information to the object and finally executing the fit command to optimize it. I think it's more convenient than scipy once you get used to it.
Pulse data was obtained from the measuring instrument. First, make a histogram with the peak value of the pulse. After that, fitting is performed and the sharpness of the spectrum is evaluated. In addition, we will try to improve the sharpness by using the lmfit weighting option.
Read the pulse data swept from the measuring instrument with pandas. Again, use tkinter to connect the file paths.
.ipynb
import tkinter
from tkinter import filedialog
import numpy as np
import pandas as pd
root = tkinter.Tk()
root.withdraw()
file = filedialog.askopenfilename(
title='file path_',
filetypes=[("csv","csv")])
if file:
pulse = pd.read_csv(file,engine='python',header=0,index_col=0)
pulse.iloc[:,[0,1,2]].plot()
Let's plot only 3 data. It was swept out so that the maximum peak value could be obtained near the 50th point.
Create a histogram with the maximum peak value. When creating a histogram, you need to determine the step size on the horizontal axis. This time, according to the Sturges formula, the number of bins is set to: k = 1 + Log2 (n).
.ipynb
bins = int(1+np.log2(pulse.shape[1]))#Sturges official
count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
#Align frequency and size
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
import matplotlib.pyplot as plt
plt.plot(level,count,marker='.',linestyle='None')
plt.show()
There was a little mountain on the right side. Let's separate the data after 50 and make a distribution again for the mountain on the right side.
.ipynb
mask = pulse.max()>50#Create an array of bool values
pulse = pulse.loc[:,mask]#Masking pulse data
bins = int(1+np.log2(pulse.shape[1]))#Sturges official
count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
plt.plot(level,count,marker='o',linestyle='None')
plt.show()
Something like a normal distribution has appeared.
As you can see from import ~, lmfit uses some class objects for fitting.
The fitting function is Gaussian and is as follows. The center of the peak is given by $ \ mu $, and the variation in distribution is given by $ \ sigma
.ipynb
from lmfit import Model, Parameters, Parameter
def gaussian(x,amp,mu,sigma):
out = amp / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 /(2 * sigma**2))
return out
model = Model(gaussian)#Substitute the fitting function and create a model object
pars = model.make_params()#Parameters object generation
pars
The created Parameters object is a set of Parameter objects. Each parameter can be accessed like a dictionary. From there with the .set method ・ Initial value value ・ Upper and lower limits of parameters, max, min: -np.inf ~ np.inf -Whether to change the value before and after fitting (constraint) vary = True or False You can play with each of them.
.ipynb
pars['mu'].set(value=70)
pars['amp'].set(value=1)
pars['sigma'].set(value=1)
pars
It's very rough, but I gave the initial value. The parameters can be set in detail and it is displayed in a DataFrame-like table. It's cool ...
Perform fitting
.ipynb
result = model.fit(data=count,params=pars,x=level)
result
The result is displayed like this. It seems that a new ModelResult object is created when .fit is executed. The generated object had a .plot_fit () method. Visualize with this function.
You can retrieve the optimized parameters in a dictionary by using the .best_values method.
.ipynb
opt = result.best_values
opt['mu']/opt['sigma']
The sharpness was evaluated by $ \ frac {\ mu} {\ sigma} $. It was 23.67.
・ Scott's official
.ipynb
h = 3.49 * opt['sigma'] / (pulse.shape[1] **(1/3))#Scott's official
r = pulse.max().max()-pulse.max().min()
bins = int(r/h)
count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
plt.plot(level,count,marker='o',linestyle='None')
plt.show()
The step size has become narrower, and the maximum value of the fraction has become smaller, but the shape of the distribution seems to be okay. Fit by exchanging only the values of data and x to be substituted. Sharpness is ... 24.44. Improved by 0.77.
This is the case from here.
He wanted to strengthen the range of $ \ mu \ pm \ sigma $ and fit weights less than 63 to 0. It seems that weighting can be done by specifying the weights option in .fit.
.ipynb
s = (level>opt['mu']-opt['sigma']) * (level<opt['mu']+opt['sigma'])
weight = (np.ones(level.size) + 4*s)*(level>63)
result3 = model.fit(data=count,params=pars,x=level,weights=weight)
↓ Weight & result It doesn't look much different, but if you look closely, it feels like it's being pulled down a little. The sharpness is ... 25.50. Only 0.06 became sharper.
Post-fit points can be called as an array type using the .best_fit method. It was convenient when making csv. Safely solve the problem ...
This time, the parameters were automatically generated with Model.make_params (), but there is also a method to directly generate an empty class with Parameters () and add variables to it with the .add method.
I was confused at first because I used multiple objects for fitting. It was easy to get confused about where and which method corresponds. However, as you get used to it, you will appreciate Parameter (s).
The fact that there were few sites that explained lmfit in Japanese was also the reason I started writing this article. Is there a better library ...? I would appreciate it if you could teach me. In the future, juniors will not be in trouble. It was the first time I wrote the code while reading the official reference (laughs). There is no choice but to do it because of the research results.
In the future, I would like to learn a little more technology that seems to be connected to my life, and I also want to be able to utilize the technology I have in various fields. I want to be creative.
That's all for python-related progress. I couldn't help reporting it to the professor (no one is doing python), so I would like to take this opportunity to report it. I am deeply grateful to the professor for waiting patiently for my data analysis.
Recommended Posts