Analysis of measurement data ①-Memorandum of understanding for scipy fitting-

Preface

One day, Kyo "T-kun, I'll send you the data that came out of the measuring instrument, so make your own graph." T "OK!" One day meeting ... T "This is a graph" Kyo "What was the value here? Did you calculate?" (...?!) Kyo "There are still 0 pieces of data in the same format as this. Please do the calculation." ((^ Ω ^) ... Owata)

There is a background such as I started learning python to automate and speed up data organization. Python was the first time I started programming. After studying little by little and writing my master's thesis, I would like to record the progress related to python. This is my first time writing an article, and I'm not sure if I'll finish writing it (3/18). GitHub also has sample data and a notebook. From here Please.

I took the data. Let's do our best to raise the graph.

T "I'm going to write a program while studying python, so can you wait a moment?" Teaching "?? Finished!" (Later, "a little" swells up to a few months)

Now, let's open the data. The data I received was txt format data that can be opened even in Excel. It's a bit disgusting to have a semicolon delimiter. Below is an example of the data. image.png

The first textbook I touched was [python introductory note](https://www.amazon.co.jp/%E8%A9%B3%E7%B4%B0-Python-%E5%85%A5%E9%96%80 % E3% 83% 8E% E3% 83% BC% E3% 83% 88-% E5% A4% A7% E9% 87% 8D-% E7% BE% 8E% E5% B9% B8 / dp / 4800711673) did. After installing Anaconda according to this textbook, I was studying by writing code on spyder.

Read file

The textbook described reading and writing files using numpy. For the time being, I read it according to the subject. I tried to pass the file path using tkinter. tkinter I don't know, but I used it like a magic spell.

.py


import tkinter as tk
import tkinter.filedialog as fd
import numpy as np

root=tk.Tk()
root.withdraw()
path = fd.askopenfilename(
        title="file---",
        filetypes=[("csv","csv"),("CSV","csv")])
if path :
    fileobj=np.genfromtxt(path,delimiter=";",skip_header=3)#Read data separated by semicolons by skipping 3 lines
    f=fileobj[:,0]#First column data

At that time, it was being read. Later, I came across a useful module called pandas. [Introduction to Jupyter [Practice] for Python users](https://www.amazon.co.jp/Python%E3%83%A6%E3%83%BC%E3%82%B6%E3%81%AE % E3% 81% 9F% E3% 82% 81% E3% 81% AEJupyter-% E5% AE% 9F% E8% B7% B5-% E5% 85% A5% E9% 96% 80-% E6% B1% A0% E5% 86% 85-% E5% AD% 9D% E5% 95% 93 / dp / 4774192236) was used as a reference to launch the notebook environment. Start again with jupyter notebook and pandas.

.ipynb


import tkinter
from tkinter import filedialog
import pandas as pd

root = tkinter.Tk()
root.withdraw()
path = filedialog.askopenfilename(
    title="file___",
    filetypes=[("txt","txt"),("csv","csv")])
if path:
    df = pd.read_csv(path,engine="python",header=None,sep=';',skiprows=3,index_col=0)

Visualization

If you read it with pandas, the table data will look like this. image.png

Let's graph it quickly with the graph function of pandas. In this experiment, we will use the data in the first and second columns. In DataFrame, data is processed using an indexer (loc, iloc), but it should be noted that the returned object is changed to Series when 1 row or 1 column data is specified.

.ipnb


import matplotlib.pyplot as plt
df=df.iloc[:,[0,1]]
df.set_index(0,inplace=True)#Overwrite df by specifying index
df.plot()
plt.show()

image.png

When plotting using pandas, it seems that the index column is automatically taken on the horizontal axis, so the index is specified in advance with .set_inedex (column name). A waveform with a sharp peak appeared around the center as shown in the image. There was noise on the edge. It depends on the number of data points, but so far I have done it in Excel.

Fitting with scipy

The process went smoothly until the graph was created, but the calculation was troublesome. The challenge this time was to ** evaluate the sharpness of the peak **. As a fitting tool, scipy's curve_fit came out as soon as I googled it, so I tried using it. The vertical axis in the graph above is the input power, and the unit is decibel (dBm). When the unit is changed to mW and standardized by the maximum value, it becomes as shown in the figure below.

df.index = df.index*pow(10,-9)
df.index.rename('freq[GHz]',inplace=True)
df['mag'] = pow(10,(df.iloc[:]-df.iloc[:].max())/10)
df['mag'].plot()
plt.show()

image.png The function I want to fit is: It is the Lorentz function plus the baseline. All variables except x. ..

f_{\left(x\right)}=A \left(\frac{\gamma}{\left(x-\mu\right)^2+\gamma^2}\right)+Bx+C

init is the initial value of the parameter. With a roughly predicted value I made a list. The optimum parameter opt and covariance cov can be obtained by curve_fit (function name, x, y, initial value of parameter).

import scipy.optimize
def lorentz(x,A,mu,gamma,B,C):#Define a function that fits
    return A*x*gamma/((x-mu)**2+gamma**2)+B*x+C
A  = -np.pi*(df.index.max()-df.index.min())/20
mu = df.index.values.mean()
gamma  = (df.index.max()-df.index.min())/20
B  = 10
C  = 0
init = [A,mu,gamma,B,C]#Initial value of the parameter you want to fit
opt, cov = scipy.optimize.curve_fit(lorentz,df.index,df['mag'],init)#fitting

image.png Plot the results. It is very convenient to create a new column with df [column name] = 〇〇. If you want to add it to a Series object, convert it to a DataFrame type via pd.DataFrame (Series object) or .reset_index (). It's nice to use the column names as they are in the legend. The vertical axis is ... ..

df['fit']=lorentz(df.index,opt[0],opt[1],opt[2],opt[3],opt[4])
df.loc[:,['mag','fit']].plot()
plt.show()

image.png It fits nicely.

The sharpness of the peak was evaluated using the obtained $ \ mu $ and $ \ gamma $. ・ $ \ Mu $: Center of peak ・ $ \ Gamma : Half the width of two points when the peak depth is halved (: half width at half maximum) Since it is, Q value (sharpness of peak) $ Q = \frac{\mu}{2\gamma} $$ It is a problem solution in search of. Continue.

Analysis of measurement data (2) -Histogram and fitting, lmfit recommendation-

Summary

The sample data represented the frequency characteristics of a certain resonant circuit. I started to touch it by myself and finally the analysis program was completed. When I moved for the first time, I was impressed. By the time I graduated, I was processing thousands of pieces of data. That's horrible. Make it and yokatta ... I want to study pandas more and more so that I can handle data freely.

Recommended Posts

Analysis of measurement data ①-Memorandum of understanding for scipy fitting-
Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-
Basics of pandas for beginners ② Understanding data overview
elasticsearch_dsl Memorandum of Understanding
Selection of measurement data
Memorandum of understanding for environment construction of AutoML library PyCaret
Python for Data Analysis Chapter 4
Python for Data Analysis Chapter 2
Tips for data analysis ・ Notes
Python for Data Analysis Chapter 3
A memorandum of understanding for the Python package management tool ez_setup
Preprocessing template for data analysis (Python)
Data analysis for improving POG 3-Regression analysis-
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats
A summary of Python e-books that are useful for free-to-read data analysis
Python visualization tool for data analysis work
A memorandum of understanding about django's QueryDict
Memorandum of Understanding when migrating with GORM
JupyterLab Basic Setting 2 (pip) for data analysis
JupyterLab Basic Setup for Data Analysis (pip)
Analysis for Data Scientists: Qiita Self-Article Summary 2020
A memorandum of trouble when formatting data
Data analysis in Python Summary of sources to look at first for beginners
Execute API of Cloud Pak for Data analysis project Job with environment variables
Introduction to Statistical Modeling for Data Analysis Expanding the range of applications of GLM
A memorandum of method often used when analyzing data with pandas (for beginners)
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Prepare a programming language environment for data analysis
Analysis for Data Scientists: Qiita Self-Article Summary 2020 (Practice)
[CovsirPhy] COVID-19 Python Package for Data Analysis: Data loading
An introduction to statistical modeling for data analysis
How to use data analysis tools for beginners
Easy understanding of Python for & arrays (for super beginners)
Sentiment analysis of large-scale tweet data by NLTK
A well-prepared record of data analysis in Python
[Data science memorandum] Handling of missing values ​​[python]