4.9 Reverse-complementing SeqRecord objects To 4.8
One of the new features in Biopython 1.57 was the SeqRecord object’s reverse_complement method. This tries to balance easy of use with worries about what to do with the annotation in the reverse complemented record. ** Added new feature reverse_complement method to Biopython 1.57. This is to correct the annotation more easily after flipping and completing the array. ** **
For the sequence, this uses the Seq object’s reverse complement method. Any features are transferred with the location and strand recalculated. Likewise any per-letter-annotation is also copied but reversed (which makes sense for typical examples like quality scores). However, transfer of most annotation is problematical. ** Use the reverse_complement method on the array. features are converted by location and strand is also recalculated. Similarly, per-letter-annotation is copied and inverted. (Especially effective for quality score) However, there are problems with many annotation conversions. ** **
For instance, if the record ID was an accession, that accession should not really apply to the reverse complemented sequence, and transferring the identifier by default could easily cause subtle data corruption in downstream analysis. Therefore by default, the SeqRecord’s id, name, description, annotations and database cross references are all not transferred by default. ** As an example, if the record ID is an accession (registration number), that accession should not be used for inverted, complemented arrays, and converting identifiers by default may cause data damage during downstream process analysis. Not. Therefore, by default, SeqRecord ’s id, name, description, annotations and database cross references are not converted. ** **
The SeqRecord object’s reverse_complement method takes a number of optional arguments corresponding to properties of the record. Setting these arguments to True means copy the old values, while False means drop the old values and use the default value. You can alternatively provide the new desired value instead. ** The reverse_complement method has multiple optional arguments that correspond to the record attribute. Specifying them as True will copy the original values. If False, the default value will be replaced with the old one. Of course, you can specify the new value yourself. ** **
Consider this example record: ** Simple example: **
>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_005816.gb", "genbank")
>>> print("%s %i %i %i %i" % (record.id, len(record), len(record.features), len(record.dbxrefs), len(record.annotations)))
NC_005816.1 9609 41 1 13
Here we take the reverse complement and specify a new identifier – but notice how most of the annotation is dropped (but not the features): ** Use the reverse_complement method to specify a new ID-notably I lost a lot of annotations, but the features are still alive. ** **
>>> rc = record.reverse_complement(id="TESTING")
>>> print("%s %i %i %i %i" % (rc.id, len(rc), len(rc.features), len(rc.dbxrefs), len(rc.annotations)))
TESTING 9609 41 0 0
Recommended Posts