Search for "convert gbff to gff" Question: Converting Gbff To Gff3 People who are at a loss when they see the content of such discussions
For those who want to know tools that can convert life science data in various formats
Obtaining genomic data from NCBI --fasta file (.fasta) --genbank file (.gbff) Can be obtained. Base sequence information (.fasta) + annotation file that adds gene information to the sequence (**. Gbff **) Makes it possible to decipher the genome.
However, if you want to use this data as a reference genome in IGV etc. IGV will not read the information unless the annotation file is **. Gff ** (gff3) or **. Gtf ** (gff2). (The explanation of the file format is omitted)
Both gbff (GenBank Flat File) and gff (General Feature Format) In short, since it is an annotation file, can it be converted? When I searched for "convert gbff to gff", I found a record of such discussions in the past, but There is no specific solution.
After researching various things, I managed to find a conversion method, so I will introduce it here.
The key to the solution was in the discussion at biostars introduced at the beginning. A mysterious script called "bp_genbank2gff3.pl". It seems that it can be used with bioperl, but it seems that there is a bug from the conversation content. Maybe there is a tool similar to python? I found it when I looked it up.
Q. If you use ensembl, you can get the annotation file as a gff (gtf) file, so why not have such a hard time in the first place? A. I only had data on ncbi, probably because I wanted to use a minor creature ...
** 1. Install bioconvert **
pip install bioconvert
This method installs Bioconvert and its Python dependencies. Note, however, that bioconvert may use (depending on the conversion you want to use) external dependencies not available on Pypi. You will need to install those third-party dependencies yourself. An alternative is to install bioconvert using conda as explained here after. https://bioconvert.readthedocs.io/en/master/installation.html
When installing with pip, it solves the dependency of the python module managed by PyPI, but it seems that it does not solve the dependency of the third party package. ** In short, installing with pip limits functionality. ** ** It seems that using conda will also solve the dependencies in that area, Ignore this time as it is only necessary to convert & gbff → gff3 without using conda.
** 2. If the installation fails in the middle (probably mappy installation failed), also install the python3-devel package **
yum install python3-devel
** 3. Install biocode **
pip install biocode
bioconvert --help
As mentioned above, the functionality is limited, and some methods are disabled warnings are displayed.
WARNING [bioconvert.core.base]: converter 'FASTQ2FASTA': method seqtk is not available
WARNING [bioconvert.core.base]: converter 'GENBANK2EMBL': method squizz is not available
WARNING [bioconvert.core.base]: converter 'GENBANK2FASTA': method squizz is not available
WARNING [bioconvert.core.base]: converter 'GZ2BZ2': method pigz_pbzip2 is not available
WARNING [bioconvert.core.base]: converter 'GZ2DSRC': method pigzdsrc is not available
genbank2gff3 genbank to-> gff3 (1 methods)
No warning is displayed for genbank2gff3 that you want to use this time, so you can rest assured. ~~ Be prepared for the log to get dirty every time you run the script ~~
bioconvert genbank2gff3 foo.gbff foo.gff3
Can be converted from foo.gbff to foo.gff3
During conversion
WARNING: The following feature was skipped:
type: assembly_gap
location: [96782:96838](+)
qualifiers:
Key: estimated_length, Value: ['56']
Key: gap_type, Value: ['within scaffold']
Key: linkage_evidence, Value: ['paired-ends']
Information that gff3 does not support, such as, is not carried over to the gff3 file.
See bioconvert readme https://github.com/bioconvert/bioconvert
Recommended Posts