(2017/2/22, CentOS x86_64)
OrthoFinder was used to perform Orthologous analysis based on the genomic information of multiple species. OrthoFinder uses MCL (markov cluster algorithm) to estimate orthologs. According to the paper, OrthoFinder is faster than other methods (such as OrthoMCL) in benchmarking tests using OrthoBench, and it is also an excellent method that has been refined by its own standardization for classification of orthologs. I will.
http://www.stevekellylab.com/software/orthofinder https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531804/
Orthologs are nowadays understood by people in various definitions, but in OrthoFinder,
It will do the above four things automatically. Regarding 3, it will create a phylogenetic tree for each species and a phylogenetic tree for each OG. If you want to create a phylogenetic tree of a species using only single-copy genes, you will have to do it yourself.
OrthoFinder depends on Python2.7, so if you are using Python3.x, please build a virtual environment with pyenv, anaconda, etc. (Reference items / 5b62d31cb7e6ed50f02c)). To install, you need to install * BLAST + *, * MCL *, * FastMe *, * DLCpar * in addition to OrthoFinder itself.
$git clone https://github.com/davidemms/OrthoFinder.git
$tar xzj OrthoFinder-1.1.2.tar.gz
MCL, FastMe
There are no particular points to note. Those who have root privileges can easily build with sudo
etc., and those who do not have root privileges can easily build by going to their respective websites and downloading. Please install by referring to the OrthoFinder Manual.
DLCper
You need to be a little careful.
You can install it in the same way as 2., but when building with setup.py
, you need to do it in the directory where * bin * contains python (you can check with which python). Simply cp
to the directory and run setup.py
, or use the --prefix
option to specify the directory to build.
If you don't do this, the Python module dlcpar will not be in Python and OrthoFinder will not work.
Specify the directory containing the Fasta files you want to parse. If you unzip the OrthoFinder package, you will find the ʻExampleData` directory containing the Fasta file directly underneath, so it is better to do a test run with it.
$python orthofinder.py -f your_fasta_dir -t 5 # -Specifying a file with the f option, -Specify the number of threads that can be used with the t option.
At this time, you can also specify a parallel job with the OrthoFinder algorithm with the -a
option. It is necessary to consider the memory and set it so that it does not crash as follows.
- 0.02 GB per species for small genomes (e.g. bacteria)
When the analysis is finished, the Results_Date
directory will be created directly under your_fasta_dir
.
The following files are generated in this directory:
- Orthogroups.csv
Tree directory
, ʻOrthologue directory`The estimated Orthogroup is included in 1. as follows. Species are separated by Tabs and genes are separated by commas. 2. is the format version of OrthoMCL.
OG | Specie1 | Specie2 | Specie3 |
---|---|---|---|
OG000001 | gene_s1_1, gene_s1_3 | gene_s2_1, gene_s2_2 | gene_s3_2 |
OG000002 | gene_s1_2, gene_s1_4 | gene_s2_3 | gene_s3_1, gene_s3_3 |
6.Statistics_Overall.csv contains 1) total number of genes used 2) estimated total number of OGs 3) percentage of genes classified as OG Contains information such as. 7.Statistics_PerSpecies.csv has the above data for each species.
A tree file of the phylogenetic tree for each OG is created in the Tree directory, and the phylogenetic tree of the species is contained in the directory directly above. In the Orthologue directory, a table of ortholog genes of 1 species x 1 species is created for each species used.
Thankfully, OrthoFinder has additional features. As for how to use
Working Directory
directly under the Result_Date
directory of the original data you want to add by specifying it as follows. For this WorkingDirectory
, specify the one that contains SpecieID.txt
.$python orthofinder -b previous_working_dir -f new_fasta_dir
You can kindly exclude it.
SpecieID.txt
in Working Directory
directly under Result
of the original data with an editor.#
to the species you want to exclude and comment them out.$python orthofinder -b previous_working_dir
Of course, you can add and exclude at the same time. Prepare the Fasta you want to add, edit SpecieID.txt
, and run it with the same command as when adding a new Fasta above.
It is also possible to move only steps such as BLAST independently. You can also create a phylogenetic tree using MAFFT
and FastTree
. See the OrthoFinder Manual (https://github.com/davidemms/OrthoFinder/blob/master/OrthoFinder-manual.pdf) for more information.
Recommended Posts