In Part 5, the measured chemical substance concentrations are plotted on a map using the ChemTHEATER data. So to speak, it is a simple GIS.
In the first part, we aim to create an animation by connecting data processing to plotting on a map, and in the second part, connecting the output image data. p>
If you have not installed Cartopy, install it in advance.
conda install -c conda-forge cartopy
Then, load the following library.
%matplotlib inline
import os
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
from cartopy import crs as ccrs
Again, start by importing the libraries needed for processing.
The first line is, as usual, a command to display the output of matplotlib.
In Part 5, the following libraries will be used. (Refer to the second and subsequent lines)
Both are already installed on Anaconda3 (Windows 64bit version).
This time, we will use cartopy to display the map on the graph of matplotlib. p>
Library th> | Overview th> | Purpose of use this time th> | Official URL th> |
---|---|---|---|
os | Standard library td> | Directory manipulation td> | https://docs.python.org/ja/3/library/os.html |
NumPy | Numerical calculation library td> | Used for numerical calculation in statistical processing td> | https://www.numpy.org |
itertools | Generate iterator 1 sup> Standard library td> Used to streamline loop processing td>
| https://docs.python.org/ja/3/library/itertools.html |
|
pandas | Data analysis library td> | Used for data processing and formatting td> | https://pandas.pydata.org |
Matplotlib | Graph drawing library td> | Used for data visualization td> | https://matplotlib.org |
cartopy | Map drawing library td> | Used for visualization of map data td> | https://scitools.org.uk/cartopy/docs/latest |
Now that the library is ready, let's load the data. Since we will be using data from marine mammals this time, search for Sample Type as "Biotic --Mammals --Marine mammals" in the Sample Search of ChemTHEATER. From the search results, click "Export samples" and "Export measured data" to download the sample list and measured value list files, respectively. Save the file in any directory and read it from your notebook as follows. p>
data_file = "measureddata_20191002074038.tsv" #Change the string entered in the variable to the tsv file name of your measured data
data = pd.read_csv(data_file, delimiter="\t")
data = data.drop(["ProjectID", "ScientificName", "RegisterDate", "UpdateDate"], axis=1) #Remove duplicate columns when joining with samples later
data
MeasuredID | SampleID | ChemicalID | ChemicalName | ExperimentID | MeasuredValue | AlternativeData | Unit | Remarks | |
---|---|---|---|---|---|---|---|---|---|
0 | 1026 | SAA000173 | CH0000034 | PCB4+PCB10 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | NaN |
1 | 1027 | SAA000173 | CH0000035 | PCB8 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | NaN |
2 | 1028 | SAA000173 | CH0000037 | PCB19 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | NaN |
3 | 1029 | SAA000173 | CH0000038 | PCB22 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | NaN |
4 | 1030 | SAA000173 | CH0000039 | PCB28 | EXA000001 | 32.000 | NaN | ng/g lipid | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7098 | 27705 | SAA002002 | CH0000094 | PCB208 | EXA000001 | 77.249 | NaN | ng/g lipid | NaN |
7099 | 27706 | SAA002002 | CH0000088 | PCB194 | EXA000001 | 512.160 | NaN | ng/g lipid | NaN |
7100 | 27707 | SAA002002 | CH0000092 | PCB205 | EXA000001 | 3.000 | <3.00E+0 | ng/g lipid | NaN |
7101 | 27708 | SAA002002 | CH0000093 | PCB206 | EXA000001 | 81.947 | NaN | ng/g lipid | NaN |
7102 | 27709 | SAA002002 | CH0000095 | PCB209 | EXA000001 | 127.064 | NaN | ng/g lipid | NaN |
7103 rows × 9 columns
sample_file = "samples_20191002074035.tsv" #Change the string entered in the variable to the tsv file name of your samples
sample = pd.read_csv(sample_file, delimiter="\t")
sample
ProjectID | SampleID | SampleType | TaxonomyID | UniqCodeType | UniqCode | SampleName | ScientificName | ... | |
---|---|---|---|---|---|---|---|---|---|
0 | PRA000003 | SAA000173 | ST004 | 34892 | es-BANK | EW00884 | M32582 | Neophocaena phocaenoides | ... |
1 | PRA000003 | SAA000174 | ST004 | 34892 | es-BANK | EW00888 | M32588 | Neophocaena phocaenoides | ... |
2 | PRA000003 | SAA000175 | ST004 | 34892 | es-BANK | EW00932 | M32580 | Neophocaena phocaenoides | ... |
3 | PRA000003 | SAA000176 | ST004 | 34892 | es-BANK | EW00929 | M33556 | Neophocaena phocaenoides | ... |
4 | PRA000003 | SAA000177 | ST004 | 34892 | es-BANK | EW00934 | M32548 | Neophocaena phocaenoides | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
197 | PRA000036 | SAA002159 | ST004 | 103596 | es-BANK | EW04779 | 060301-1 | Peponocephala electra | ... |
198 | PRA000036 | SAA002160 | ST004 | 103596 | es-BANK | EW00115 | M32625 | Peponocephala electra | ... |
199 | PRA000036 | SAA002161 | ST004 | 103596 | es-BANK | EW00122 | M32633 | Peponocephala electra | ... |
200 | PRA000036 | SAA002162 | ST004 | 103596 | es-BANK | EW00116 | M32626 | Peponocephala electra | ... |
201 | PRA000036 | SAA002163 | ST004 | 103596 | es-BANK | EW00117 | M32627 | Peponocephala electra | ... |
202 rows × 66 columns
Prepare to be able to plot the DataFrame read in Chap.2 on the map. First, combine measured data and samples and delete the N / A column. p>
df = pd.merge(data, sample, on="SampleID")
df = df.dropna(how='all', axis=1)
df
MeasuredID | SampleID | ChemicalID | ChemicalName | ExperimentID | MeasuredValue | AlternativeData | Unit | ... | |
---|---|---|---|---|---|---|---|---|---|
0 | 1026 | SAA000173 | CH0000034 | PCB4+PCB10 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | ... |
1 | 1027 | SAA000173 | CH0000035 | PCB8 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | ... |
2 | 1028 | SAA000173 | CH0000037 | PCB19 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | ... |
3 | 1029 | SAA000173 | CH0000038 | PCB22 | EXA000001 | 0.010 | <1.00E-2 | ng/g lipid | ... |
4 | 1030 | SAA000173 | CH0000039 | PCB28 | EXA000001 | 32.000 | NaN | ng/g lipid | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7098 | 27705 | SAA002002 | CH0000094 | PCB208 | EXA000001 | 77.249 | NaN | ng/g lipid | ... |
7099 | 27706 | SAA002002 | CH0000088 | PCB194 | EXA000001 | 512.160 | NaN | ng/g lipid | ... |
7100 | 27707 | SAA002002 | CH0000092 | PCB205 | EXA000001 | 3.000 | <3.00E+0 | ng/g lipid | ... |
7101 | 27708 | SAA002002 | CH0000093 | PCB206 | EXA000001 | 81.947 | NaN | ng/g lipid | ... |
7102 | 27709 | SAA002002 | CH0000095 | PCB209 | EXA000001 | 127.064 | NaN | ng/g lipid | ... |
7103 rows × 39 columns
Next, extract the necessary data. This time, we will handle data on four types of chemical substances (ΣPCBs, ΣDDTs, ΣCHLs, ΣHCHs) with the unit [ng / g lipid]. p>
data_lipid = df[df["Unit"] == "ng/g lipid"]
data_lipid = data_lipid[(data_lipid["ChemicalName"] == "ΣPCBs") | (data_lipid["ChemicalName"] == "ΣDDTs") |
(data_lipid["ChemicalName"] == "ΣCHLs") | (data_lipid["ChemicalName"] == "ΣHCHs")]
This completes the extraction of the necessary data. For confirmation, obtain a list of species and chemical names included in this dataset. p>
lipid_species = data_lipid["ScientificName"].unique()
lipid_chemicals = data_lipid["ChemicalName"].unique()
lipid_species, lipid_chemicals
(array(['Peponocephala electra', 'Neophocaena phocaenoides',
'Sousa chinensis', 'Stenella coeruleoalba'], dtype=object),
array(['ΣPCBs', 'ΣDDTs', 'ΣCHLs', 'ΣHCHs'], dtype=object))
Also, here, DataFrame / data_lipid is output in CSV format so that the same data can be used in the second part. CSV output uses Pandas' to_csv method. p>
data_lipid.to_csv("data.csv")
Next, we will plot on the map, which is the main part of this time. But first, let's carefully check how to create a map.
Since the graph this time will be output on the map, the map will be output first. In Python, you can use cartopy to output a map on matplotlib. p>
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
plt.show()
Next, check the glag that is overlaid on the map. This time, the points in the two-dimensional space of latitude and longitude are plotted as measurement points, so the data is drawn as a scatter plot.
Before drawing, select the data for each species included in data_lipid. p>
df_0 = data_lipid[(data_lipid["ChemicalName"] == "ΣPCBs") & (data_lipid["ScientificName"] == lipid_species[0])] #Melon-headed whale ΣPCBs
df_1 = data_lipid[(data_lipid["ChemicalName"] == "ΣPCBs") & (data_lipid["ScientificName"] == lipid_species[1])] #Finless porpoise ΣPCBs
df_2 = data_lipid[(data_lipid["ChemicalName"] == "ΣPCBs") & (data_lipid["ScientificName"] == lipid_species[2])] #Sinaus Iro Dolphin ΣPCBs
df_3 = data_lipid[(data_lipid["ChemicalName"] == "ΣPCBs") & (data_lipid["ScientificName"] == lipid_species[3])] #Striped dolphin ΣPCBs
When the data is more divided, try scatter plotting. The graph drawing of matplotlib can be overlaid on top, so you can add images one by one. p>
fig = plt.figure()
ax = plt.axes()
#CollectionLongitudeFrom,Extract latitude / longitude information from CollectionLatitudeFrom
ax.scatter(x = np.array(df_0["CollectionLongitudeFrom"]), y = np.array(df_0["CollectionLatitudeFrom"]), c = "red", alpha=0.5)
ax.scatter(x = np.array(df_1["CollectionLongitudeFrom"]), y = np.array(df_1["CollectionLatitudeFrom"]), c = "blue", alpha=0.5)
ax.scatter(x = np.array(df_2["CollectionLongitudeFrom"]), y = np.array(df_2["CollectionLatitudeFrom"]), c = "yellow", alpha=0.5)
ax.scatter(x = np.array(df_3["CollectionLongitudeFrom"]), y = np.array(df_3["CollectionLatitudeFrom"]), c = "green", alpha=0.5)
plt.show()
Now that we have drawn the base figure, let's draw it by superimposing them. p>
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.scatter(x = np.array(df_0["CollectionLongitudeFrom"]), y = np.array(df_0["CollectionLatitudeFrom"]), c = "red", alpha=0.5)
ax.scatter(x = np.array(df_1["CollectionLongitudeFrom"]), y = np.array(df_1["CollectionLatitudeFrom"]), c = "blue", alpha=0.5)
ax.scatter(x = np.array(df_2["CollectionLongitudeFrom"]), y = np.array(df_2["CollectionLatitudeFrom"]), c = "yellow", alpha=0.5)
ax.scatter(x = np.array(df_3["CollectionLongitudeFrom"]), y = np.array(df_3["CollectionLatitudeFrom"]), c = "green", alpha=0.5)
plt.show()
I was able to output for the time being with Sec.4-2. However, in the current state, the map range is set automatically. Since this is only automatically processed so that the entire range in which the data exists is included, the range changes depending on the data to be handled. Here, the map is fixed within an appropriate range. p>
By using the set_xlim and set_ylim methods of matplotlib, the range of the X-axis direction and Y-axis direction of the graph to be drawn can be determined respectively. p>
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.scatter(x = np.array(df_0["CollectionLongitudeFrom"]), y = np.array(df_0["CollectionLatitudeFrom"]), c = "red", alpha=0.5)
ax.scatter(x = np.array(df_1["CollectionLongitudeFrom"]), y = np.array(df_1["CollectionLatitudeFrom"]), c = "blue", alpha=0.5)
ax.scatter(x = np.array(df_2["CollectionLongitudeFrom"]), y = np.array(df_2["CollectionLatitudeFrom"]), c = "yellow", alpha=0.5)
ax.scatter(x = np.array(df_3["CollectionLongitudeFrom"]), y = np.array(df_3["CollectionLatitudeFrom"]), c = "green", alpha=0.5)
ax.set_xlim(90,180) #Range in the X-axis (longitude) direction to draw
ax.set_ylim(15, 60) #Range in the Y-axis (latitude) direction to draw
ax.set_title("ΣPCBs")
plt.show()
As you can see in
Chap.3, this DataFrame contains data of 4 kinds of chemical substances, so let's output 4 of them side by side in a single sheet. </ p>
First, as we did in Chap.4, we sort out the data for each chemical substance, but since there are 16 combinations (4 types of biological and chemical substances each) in total, we list them here. Collectively, it takes the form of reading sequentially when drawing. At this time, it is necessary to generate each combination (for example, ΣPCBs of finless porpoise), so use the function of Python iterator 1 sup> to directly product. We will ask for 2 sup>. In other words, the direct product of the list of chemical substances (lipid_chemicals) and the list of species names (lipid_species) is obtained, and all 16 combination patterns are derived. p>
This only reflects the measurement points, so plot the measured values on the map as well. Let's reflect it in the radius of each scatter point. All you have to do is assign the data in the MeasuredValue column to the s parameter of the scatter method that draws the scatter plot. p>
If the value of MeasuredValue is substituted as it is into the radius of the scatter point, the circle of the graph of ΣDDTs will become large and all will overlap, so adjust so that the radius is small. Here, the size is uniformly set to 1/10. </ p> Now you can plot the measurement results on the map. However, since all the data are displayed in an overlapping manner, it is very difficult to see, and the time-series changes cannot be seen in this figure. In the second part, we will animate to solve those problems. p>
1 sup> An iterator is a mechanism that repeatedly accesses the next element in a type that collects data such as a list. p>
2 sup> Cartesian product is to create a new set by extracting one element from each set and making a set as shown in the figure below. With Cartesian product. p>
Recommended Posts
df_list = []
for k1, k2 in itertools.product(lipid_chemicals, lipid_species):
df_list.append(data_lipid[(data_lipid["ChemicalName"] == k1) & (data_lipid["ScientificName"] == k2)]) #Extract data for each combination
ax = [0]*4
fig = plt.figure(figsize=(16, 9))
for i in range(4):
ax[i] = fig.add_subplot(2, 2, i+1, projection=ccrs.PlateCarree()) #Added area to draw graph
ax[i].coastlines()
ax[i].scatter(x = np.array(df_list[4*i+0]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+0]["CollectionLatitudeFrom"]),
c = "red", alpha=0.5) #Melon-headed whale
ax[i].scatter(x = np.array(df_list[4*i+1]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+1]["CollectionLatitudeFrom"]),
c = "blue", alpha=0.5) #Finless porpoise
ax[i].scatter(x = np.array(df_list[4*i+2]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+2]["CollectionLatitudeFrom"]),
c = "yellow", alpha=0.5) #Cinaus dolphin
ax[i].scatter(x = np.array(df_list[4*i+3]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+3]["CollectionLatitudeFrom"]),
c = "green", alpha=0.5) #Striped dolphin
ax[i].set_xlim(90,180)
ax[i].set_ylim(15, 60)
ax[i].set_title(lipid_chemicals[i])
plt.show()
fig = plt.figure(figsize=(16, 9))
for i in range(4):
ax[i] = fig.add_subplot(2, 2, i+1, projection=ccrs.PlateCarree())
ax[i].coastlines()
ax[i].scatter(x = np.array(df_list[4*i+0]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+0]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+0]["MeasuredValue"]), c = "red", alpha=0.5)
ax[i].scatter(x = np.array(df_list[4*i+1]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+1]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+1]["MeasuredValue"]), c = "blue", alpha=0.5)
ax[i].scatter(x = np.array(df_list[4*i+2]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+2]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+2]["MeasuredValue"]), c = "yellow", alpha=0.5)
ax[i].scatter(x = np.array(df_list[4*i+3]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+3]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+3]["MeasuredValue"]), c = "green", alpha=0.5)
ax[i].set_xlim(90,180)
ax[i].set_ylim(15, 60)
plt.title(lipid_chemicals[i])
plt.show()
fig = plt.figure(figsize=(16, 9))
rate = [10,10,10,10] #Variable indicating scale
for i in range(4):
ax[i] = fig.add_subplot(2, 2, i+1, projection=ccrs.PlateCarree())
ax[i].coastlines()
ax[i].scatter(x = np.array(df_list[4*i+0]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+0]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+0]["MeasuredValue"])/rate[i], c = "red", alpha=0.3)
ax[i].scatter(x = np.array(df_list[4*i+1]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+1]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+1]["MeasuredValue"])/rate[i], c = "blue", alpha=0.3)
ax[i].scatter(x = np.array(df_list[4*i+2]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+2]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+2]["MeasuredValue"])/rate[i], c = "yellow", alpha=0.3)
ax[i].scatter(x = np.array(df_list[4*i+3]["CollectionLongitudeFrom"]), y = np.array(df_list[4*i+3]["CollectionLatitudeFrom"]), s=np.array(df_list[4*i+3]["MeasuredValue"])/rate[i], c = "green", alpha=0.3)
ax[i].set_xlim(90,180)
ax[i].set_ylim(15, 60)
plt.title(lipid_chemicals[i])
plt.show()
footnote