When experimenting, it is common for the measurement $ Z $ to depend on two parameters, $ X $ and $ Y $. In such cases, you can summarize the results individually in a graph of $ X $ vs $ Z $ or $ Y $ vs $ Z $, but a color plot of $ X $ vs $ Y $ vs $ Z $ [^ 1] If you put them together in one sheet, you can see the data from a more bird's-eye view.
However, this color plot is a little troublesome, and in order to prepare the 2D data of $ (X, Y, Z) $ required for creation, it is almost always necessary to perform some preprocessing on the raw data. is. Therefore, in this article, I will explain how to format data for creating color plots, using two cases that are often encountered in practice as examples.
When you create a color plot, you need the above data structure. In other words, it is mesh-like data in which the value of $ Z $ is stored for the two axes of $ X $ and $ Y $. In the following, we will format the data with this shape as the goal.
So there are two cases that require pretreatment. In the example below, the extension of the file is DAT, but in the processing, CSV is also supplemented. The process is the same, only the initial loading method is different.
The first example is when the data file is divided by either parameter of $ X $ or $ Y $. In this case, you need to combine the split files, and now that the value of $ Y $ is recorded in the filename, you need to extract that value to make a list of $ Y $.
sample1.py
#Executing this file will produce the sample data of Example 1.
#Create an appropriate working directory before executing.
import numpy as np
#Parameter definition
intensity = 50 #Strength
HWHM = 3 #Half width half width
a = 3 #Large amount of data variation
#Creating a data file
for Y in np.arange (0, 10.1, 0.1):
filename = 'sample1_Y={}.dat'.format(str(Y))
X0 = (200 * Y * Y + 2500) ** 0.5 - 50
with open(filename, 'w') as file:
file.writelines('X' + '\t' + 'Z' +'\n')
for X in range (0, 101):
Z = intensity * HWHM ** 2 / ((X - X0) ** 2 + HWHM ** 2)\
+ 20 + a * np.random.rand()
file.writelines(str(X) + '\t' + str(Z) + '\n')
The second example is when $ X $, $ Y $, and $ Z $ are written in one file. In this case, data formatting is required to sort $ Z $ with the $ Y $ column as the horizontal axis.
sample2.py
#Executing this file will produce the sample data of Example 2.
#Create an appropriate working directory before executing.
import numpy as np
#Parameter definition
intensity = 50 #Strength
HWHM = 3 #Half width half width
a = 3 #Large amount of data variation
#Creating a data file
with open('sample2.dat', 'w') as file:
file.writelines('X' + '\t' + 'Y' + '\t' + 'Z' +'\n')
for Y in np.arange (0, 10.1, 0.1):
X0 = (200 * Y * Y + 2500) ** 0.5 - 50
for X in range (0, 101):
Z = intensity * HWHM ** 2 / ((X - X0) ** 2 + HWHM ** 2)\
+ 20 + a * np.random.rand()
file.writelines(str(X) + '\t' + str(Y) + '\t' + str(Z) + '\n')
[Digression] The Lorentz distribution was prepared as a sample this time. sample2.py
is one file, but in the case of sample1.py
, the data file is divided, and it becomes 101 dat files in 0.1 increments from Y = 0.0 to 10.0. If you try to open Y = 5.5.dat
, it will look like the figure on the left. The graph on the right plots $ Z $ for $ X $. The position of this mountain depends on $ X $ and $ Y $.
So, using these two cases as an example, we will format the data from now on. Use Jupyter Notebook for work.
In[1]
import re, glob
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Load the library.
In[2]
filelist = glob.glob('*.dat')
df = pd.DataFrame()
for file in filelist:
match = re.match('sample1_Y=(.*).dat', file)
df_sub = pd.read_table(file) #For CSV, pd.read_csv()use
df[float(match.group(1))] = df_sub.Z
df.columns.name = 'Y'
df.index = df_sub.X
df
Out[2]
Y 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ... 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
X
0 71.307065 71.624977 72.568247 70.073268 71.388264 71.429283 69.430455 66.104600 64.251044 61.960019 ... 20.892164 21.724984 20.259025 21.625291 22.658143 22.641024 20.799494 21.042593 20.667364 20.451245
1 66.347248 65.184597 66.907600 67.807422 67.879276 70.401552 72.100718 72.617697 72.195462 70.692071 ... 20.409888 22.230631 21.106551 22.801198 21.159110 20.973036 21.779757 20.625188 21.405971 21.577096
2 54.815612 54.960281 55.477640 57.619689 59.971637 60.601975 63.984228 65.729155 67.846441 69.637961 ... 21.854349 20.668861 20.172761 20.416828 21.374005 21.202518 21.688063 21.056256 22.637612 20.305400
3 46.311290 46.916455 47.512971 47.175870 48.731614 50.673641 52.572572 55.803255 59.562894 62.597950 ... 22.427942 20.156526 21.141887 22.187281 21.712688 22.921697 22.876228 22.972608 22.592168 21.185094
4 40.442910 38.820936 38.994950 41.859569 40.333883 42.995725 45.152994 47.650007 49.414120 53.309453 ... 21.873397 20.659303 21.022158 20.543980 23.023661 21.418374 22.771670 20.218522 22.349163 21.412955
5 35.598731 33.633262 35.782304 36.352184 35.778557 37.035520 38.947890 40.425551 41.307669 43.021355 ... 22.766781 20.876074 20.208458 21.890359 22.792392 22.499805 22.652404 22.497508 22.339281 21.668357
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 21.804124 21.797596 23.010401 20.258773 20.073975 21.918238 22.169896 21.988170 20.120070 20.975194 ... 28.402855 28.041022 32.390451 36.689843 48.158836 58.781772 71.028003 67.065172 52.942427 42.138778
96 21.250080 21.630843 21.553191 22.952056 21.329605 20.270100 21.658320 20.191202 21.166837 20.145893 ... 27.438835 27.227526 31.282917 32.428334 40.363609 50.900390 63.092738 70.580351 63.513627 48.526807
97 22.389748 21.693057 20.886997 21.460203 22.610140 20.102447 23.021290 22.793081 22.306881 20.704143 ... 25.180590 26.366878 28.042743 30.121939 36.192960 40.735346 53.298041 67.425563 70.242088 60.144091
98 21.265201 21.367930 21.225976 20.466155 21.115541 20.294466 20.556839 22.789051 20.945778 21.343996 ... 23.949042 24.200023 26.902070 28.732446 32.388426 35.483425 44.613251 56.697203 68.448203 70.253491
99 22.688459 22.243006 22.604197 22.114754 22.967067 22.538572 21.954847 22.286714 22.779653 20.139557 ... 23.386930 24.568997 25.573001 27.852061 30.095048 32.057468 37.229132 47.773504 61.152210 71.645167
100 21.503768 20.480336 21.507903 21.943483 21.158995 20.880028 22.613661 21.468507 22.059082 20.855645 ... 24.196745 25.333946 24.965214 27.570366 27.059141 31.368592 34.169847 40.928392 48.591212 62.390900
101 rows × 101 columns
Search the DAT files in the directory with glob
and read them one by one as a data frame df_sub
.
Then, add the second column ($ Z $) of df_sub
to the data frame df
that finally causes the color plot.
At that time, the value of $ Y $ read by re.match
from the file name is set in the column name.
py:In[3]:
plt.pcolor(df.index, df.columns, df.T)
plt.colorbar()
plt.axis('tight')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Finally, you can use matplotlib to color plot like this.
In[1]
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Load the library.
In[2]
df = pd.read_table('sample2.dat') #For CSV, pd.read_csv()use
df
Out[2]
| | X | Y | Z |
|:--|:--|:--|:--|
| 0 | 0 | 0.0 | 72.891364 |
| 1 | 1 | 0.0 | 66.015389 |
| 2 | 2 | 0.0 | 56.577833 |
| 3 | 3 | 0.0 | 47.967175 |
| 4 | 4 | 0.0 | 40.049795 |
| 5 | 5 | 0.0 | 33.520995 |
| ... | ... | ... | ... |
| 10195 | 95 | 10.0 | 34.230043 |
| 10196 | 96 | 10.0 | 39.323960 |
| 10197 | 97 | 10.0 | 47.548997 |
| 10198 | 98 | 10.0 | 55.833268 |
| 10199 | 99 | 10.0 | 66.757378 |
| 10200 | 100 | 10.0 | 70.632926 |
10201 rows × 3 columns
I read the data. This is aggregated for $ X $ and $ Y $.
Use the DataFrame method .pivot_table ()
.
In[3]
df_pivot = pd.pivot_table(data=df, values='Z', columns='Y', index='X', aggfunc=np.mean)
df_pivot
Out[3]
Y 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ... 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10.0
X
0 72.891364 71.124620 70.654984 70.037212 70.172797 69.732972 68.793112 65.933899 64.488065 59.392308 ... 23.045598 22.673641 22.600140 22.112334 21.315886 21.963097 21.105755 21.827151 21.567903 21.151945
1 66.015389 66.330797 67.099211 69.468310 68.399146 68.998129 70.942877 71.911890 70.655064 68.509530 ... 20.235786 21.015988 22.415627 20.175461 20.249661 21.286285 22.163261 20.167906 22.193590 22.611962
2 56.577833 55.291176 57.559546 57.364896 58.140628 61.156353 63.832460 66.498951 67.410308 69.306595 ... 20.598574 21.103155 21.149578 21.014833 21.009504 21.841099 21.587648 22.296160 21.123641 22.874411
3 47.967175 47.952907 45.950706 47.029444 48.391456 51.034951 52.411894 56.019204 59.728020 63.807839 ... 22.514587 22.240905 22.201533 21.571261 22.403295 21.390697 20.246681 22.210926 21.520711 21.784959
4 40.049795 38.779545 41.234613 40.129730 41.675496 43.363557 44.458340 45.149790 49.194773 53.192819 ... 20.296393 21.070061 20.863386 21.854448 21.168673 22.133117 21.882360 20.162296 21.350260 20.466510
5 33.520995 34.508065 33.640439 34.172906 37.379081 35.589519 38.723906 38.893507 40.820806 44.050499 ... 21.119756 20.837089 22.140866 23.018667 21.209434 22.741423 20.494395 21.803438 20.179044 21.418150
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 21.609598 21.595324 21.822186 21.381549 21.119773 22.047828 22.708401 21.714110 22.622491 21.829242 ... 28.407967 32.270296 37.226492 48.122027 60.701823 69.480529 65.041743 52.447009 40.167799 34.230043
96 22.725472 20.298792 22.131073 20.807929 21.241496 20.429434 21.873849 20.708636 21.940816 21.854451 ... 27.856948 29.345740 33.471768 39.978015 50.693978 64.438375 70.241885 63.729539 48.403309 39.323960
97 22.734694 22.755155 21.598300 20.712057 22.349692 21.692798 22.985825 22.995810 20.447362 22.031959 ... 27.358939 27.096302 30.500497 35.350520 41.393291 54.023276 65.693318 69.426084 60.709163 47.548997
98 20.429320 20.835029 22.714230 22.396262 22.322744 21.048957 22.671866 21.613990 20.339620 22.711587 ... 24.988529 28.050123 29.537614 32.430894 36.235623 44.963558 56.700622 68.762960 69.940436 55.833268
99 21.826368 22.945654 22.277211 20.131568 21.019710 21.633040 21.798181 21.139721 20.183818 22.055120 ... 25.848098 27.116651 27.592164 29.924541 31.438265 39.224727 45.971381 60.573153 70.092905 66.757378
100 22.964366 22.522586 22.005465 20.918149 21.038924 22.418933 21.325841 22.340799 20.054492 22.689244 ... 23.969884 26.387081 25.109298 26.920826 28.967549 34.624010 38.952119 49.455348 64.123081 70.632926
101 rows × 101 columns
The data has been formatted in the same way as before. All you have to do is plot with matplotlib.
In[4]
plt.pcolor(df_pivot.index, df_pivot.columns, df_pivot.T)
plt.colorbar()
plt.axis('tight')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
So, I made a color plot with two examples.
[Promotion] I have written an article like this, so please refer to it if you like. .. Experimental data analysis with DataFrame of Python / pandas (for physicists and engineers)
[^ 1]: Also called image plot, color map, or heat map, in this article, we will use color plots for the same name.
Recommended Posts