PhytoMine-I tried to get the genetic information of plants with Python
I found out that there is a PhytoMine that can call the data of Phytozome from Python, so I tried it. Phytozome is a site familiar to plant researchers, and is a convenient site for examining the genomic and genetic information of various plant species.
PhytoMine is one of the registries of a data warehouse system called InterMine.
InterMine is an open source data warehouse system licensed under LGPL2.1. InterMine is used to create a database of biological data accessed by advanced web query tools. You can use InterMine to create a database from a single dataset or integrate multiple data sources. Support for some common biological formats is provided and there is a framework for adding other data. InterMine includes a user-friendly web interface that works "out of the box" and is easy to customize.
From Wikipedia "InterMine"
InterMine is available in a variety of programming languages, including Python. See API and Client Libraries for more information.
I tried using PhytoMine in Python by referring to InterMine-Python Tutorial. The installation was done with pip.
$ pip install intermine
I specified the gene function and plant species as a query and tried to get a list of genes in Python. The list was created in Pandas. The source code is as follows.
size = 20 #Specify the number of data to acquire
import pandas as pd
from intermine.webservice import Service
service = Service("https://phytozome.jgi.doe.gov/phytomine/service") #Create an instance by specifying the URL of PhytoMine
query = service.new_query("Gene") #Get genetic information
query.add_constraint("briefDescription","CONTAINS","transcription factor") #Specify gene function(Condition A)
query.add_constraint("name","CONTAINS","Eucgr") #At the beginning of the gene name of Eucalyptus Grandis"Eucgr"Designate Eucalyptus Grandis as a plant species using(Condition B)
query.add_constraint("name","CONTAINS","Potri") #At the beginning of the poplar gene name"Potri"Designate poplar as a plant species using(Condition C)
query.set_logic("A & (B | C)") #Settings for examining the genes of both Eucalyptus Grandis and Poplar(Condition A and condition B or condition C)
dfs = [] #Create an empty list to save the output
for row in query.rows(size=size):
dfs.append(pd.DataFrame(row.values(),index=row.keys()).T) #Get data and save to list
dfs = pd.concat(dfs) #Convert list to dataframe
dfs.to_csv("Tree_TFs_Top20.csv") #Save dataframe in csv format
|
Gene.briefDescription |
Gene.cytoLocation |
Gene.description |
Gene.genomicOrder |
Gene.id |
Gene.length |
Gene.name |
Gene.primaryIdentifier |
Gene.score |
Gene.scoreType |
Gene.secondaryIdentifier |
Gene.symbol |
0 |
(1 of 102) PF00319 - SRF-type transcription fa... |
None |
None |
None |
49560540 |
186 |
Potri.010G098100 |
Potri.010G098100 |
None |
None |
PAC:26981244 |
None |
0 |
(1 of 102) PF00319 - SRF-type transcription fa... |
None |
None |
None |
303626540 |
186 |
Potri.010G098100 |
Potri.010G098100 |
None |
None |
PAC:37221527 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
48348276 |
2263 |
Potri.007G090600 |
Potri.007G090600 |
None |
None |
PAC:27016559 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
48359640 |
1853 |
Potri.003G139300 |
Potri.003G139300 |
None |
None |
PAC:26998891 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
48837989 |
1051 |
Potri.005G168700 |
Potri.005G168700 |
None |
None |
PAC:27030760 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
49691741 |
1649 |
Potri.017G055400 |
Potri.017G055400 |
None |
None |
PAC:26983926 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
50099858 |
2177 |
Potri.005G077300 |
Potri.005G077300 |
None |
None |
PAC:27029242 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
50216626 |
2401 |
Potri.013G135600 |
Potri.013G135600 |
None |
None |
PAC:26993814 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
50231866 |
2179 |
Potri.019G102200 |
Potri.019G102200 |
None |
None |
PAC:27025339 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
303511172 |
2177 |
Potri.005G077300 |
Potri.005G077300 |
None |
None |
PAC:37265642 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
303527050 |
1051 |
Potri.005G168700 |
Potri.005G168700 |
None |
None |
PAC:37263387 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
303695561 |
2263 |
Potri.007G090600 |
Potri.007G090600 |
None |
None |
PAC:37252859 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
303799992 |
2401 |
Potri.013G135600 |
Potri.013G135600 |
None |
None |
PAC:37233326 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
303940612 |
2179 |
Potri.019G102200 |
Potri.019G102200 |
None |
None |
PAC:37260937 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
304098097 |
1649 |
Potri.017G055400 |
Potri.017G055400 |
None |
None |
PAC:37223899 |
None |
0 |
(1 of 10) PTHR31657:SF9 - ETHYLENE-RESPONSIVE ... |
None |
None |
None |
304255554 |
1853 |
Potri.003G139300 |
Potri.003G139300 |
None |
None |
PAC:37236557 |
None |
0 |
(1 of 11) K08064 - nuclear transcription facto... |
None |
None |
None |
49458724 |
4801 |
Potri.011G098400 |
Potri.011G098400 |
None |
None |
PAC:27000615 |
None |
0 |
(1 of 11) KOG4282 - Transcription factor GT-2 ... |
None |
None |
None |
174786351 |
2903 |
Eucgr.J01012 |
Eucgr.J01012 |
None |
None |
PAC:32033046 |
None |
0 |
(1 of 11) KOG4282 - Transcription factor GT-2 ... |
None |
None |
None |
174819386 |
2316 |
Eucgr.J02994 |
Eucgr.J02994 |
None |
None |
PAC:32035652 |
None |
0 |
(1 of 11) KOG4282 - Transcription factor GT-2 ... |
None |
None |
None |
175094637 |
2197 |
Eucgr.G03225 |
Eucgr.G03225 |
None |
None |
PAC:32071912 |
None |
Looking at PhytoMine's Query Builder page, it seems that there are various data types other than genes that can be used for queries, so I will try it little by little. I want to.