An overview of the effects of mutations on amino acids that occur when bases on CDS are changed
procedure
--Get the CDS from the Refseq NM ID by an appropriate method --Translate the acquired CDS into an amino acid sequence --Reflect the mutation on the acquired CDS --Translate the CDS that reflects the displacement into an amino acid sequence
The data of CCDS project will be used to obtain the CDS ID from the Refseq ID.
The CCDS project provides a standard set of human and mouse consensus coding regions.
Get CCDS2Sequence.current.txt
to find out the CCDS ID from the Refseq ID
curl -LO ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS2Sequence.current.txt
The first column of this tab-delimited text is the CCD ID, and the fifth column is the Refseq nucleotide_ID.
#ccds original_member current_member source nucleotide_ID protein_ID status_in_CCDS sequence_status
CCDS2.2 1 0 NCBI NM_152486.2 NP_689699.2 Updated 0
CCDS2.2 0 1 NCBI NM_152486.3 NP_689699.2 Accepted 1
CCDS2.2 1 1 EBI,WTSI ENST00000342066.7 ENSP00000342313.3 Accepted 1
...(Omitted below)
Let's easily find the CCD ID from Refseq ID "NM_130786"
% grep "NM_130786." CCDS2Sequence.current.txt
CCDS12976.1 1 0 NCBI NM_130786.3 NP_570602.2 Updated 0
CCDS12976.1 0 1 NCBI NM_130786.4 NP_570602.2 Accepted 1
From this result, it can be seen that "NM_130768.4" was selected as the consensus for "NM_130768" and "CCDS12976.1" is the corresponding CCDID ID.
Get CCDS_nucleotide.current.fna.gz
to get the CDS
curl -LO ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS_nucleotide.current.fna.gz
CCDS_nucleotide.current.fna.gz
fileSince CCDS_nucleotide.current.fna.gz
is a fasta format file, it is expanded once to refer to the data using samtools faidx and compressed again using the bgzip
command.
Extract the CCDS_nucleotide.current.fna.gz
file with gunzip
$ gunzip CCDS_nucleotide.current.fna.gz
Compress with bgzip
and create an index
$ bgzip -i CCDS_nucleotide.current.fna
Since bgzip supports multiple threads, it operates at high speed by specifying a certain number of threads with the option -@ number of threads
.
Originally, it is compressed for each block, so it may be suitable for speeding up by threads.
The result of changing the number of threads as a trial is as follows (up to 8 threads because the execution environment is Quad-Core Intel Core i7)
$ time bgzip -i -@ 1 CCDS_nucleotide.current.fna
bgzip -i -@ 1 CCDS_nucleotide.current.fna 9.58s user 0.08s system 99% cpu 9.739 total
$ gunzip CCDS_nucleotide.current.fna.gz
$
$time bgzip -i -@ 4 CCDS_nucleotide.current.fna
bgzip -i -@ 4 CCDS_nucleotide.current.fna 10.21s user 0.12s system 391% cpu 2.638 total
$ gunzip CCDS_nucleotide.current.fna.gz
$
$ time bgzip -i -@ 8 CCDS_nucleotide.current.fna
bgzip -i -@ 8 CCDS_nucleotide.current.fna 11.89s user 0.12s system 710% cpu 1.689 total
Looking at the compression time, it is reduced from 9.6 seconds to 1.7 seconds (since the calculation time is short, there is not much benefit in this example)
Header information is needed to get the array from the fasta file Get information for headers that include "CCDS12976.1" Here, the index file for fasta is used.
$ grep 'CCDS12976.1' CCDS_nucleotide.current.fna.gz.fai | cut -f 1
CCDS12976.1|Hs109|chr19
Now that we know the header information, use samtools faidx
to get the CDS
In the header string|
Because it contains, sandwich it with double quotation marks"CCDS12976.1|Hs109|chr19"
Search as
$ samtools faidx CCDS_nucleotide.current.fna.gz "CCDS12976.1|Hs109|chr19"
>CCDS12976.1|Hs109|chr19
ATGTCCATGCTCGTGGTCTTTCTCTTGCTGTGGGGTGTCACCTGGGGCCCAGTGACAGAA
GCAGCCATATTTTATGAGACGCAGCCCAGCCTGTGGGCAGAGTCCGAATCACTGCTGAAA
...(Omission)
GAATCGGAGCTCAGCGACCCTGTGGAGCTCCTGGTGGCAGAAAGCTGA
Now that the sequence has been obtained, prepare a program to determine the effect on the amino acid sequence when the mutation is considered to simply change the CDS.
Prepare the tools you need
ref
at the specified position of the given array with ʻalt`cds2protein.py
from math import ceil
def translate_cds(cds_sequence):
"""
Return translated AA sequence from base sequence.
"""
translation_table = {
'AAA': 'K', 'AAC': 'N', 'AAG': 'K', 'AAT': 'N',
'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T',
'AGA': 'R', 'AGC': 'S', 'AGG': 'R', 'AGT': 'S',
'ATA': 'I', 'ATC': 'I', 'ATG': 'M', 'ATT': 'I',
'CAA': 'Q', 'CAC': 'H', 'CAG': 'Q', 'CAT': 'H',
'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P',
'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R',
'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L',
'GAA': 'E', 'GAC': 'D', 'GAG': 'E', 'GAT': 'D',
'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A',
'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G',
'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V',
'TAA': '*', 'TAC': 'Y', 'TAG': '*', 'TAT': 'Y',
'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S',
'TGA': '*', 'TGC': 'C', 'TGG': 'W', 'TGT': 'C',
'TTA': 'L', 'TTC': 'F', 'TTG': 'L', 'TTT': 'F'
}
if len(cds_sequence) < 3:
return ""
cds_length = int(len(cds_sequence) / 3)
return "".join([translation_table[cds_sequence[i * 3:(i + 1) * 3].upper()] for i in range(cds_length)])
def retrieve_base_sequence_from_fasta(fasta):
"""
Returns continuous sequence without header.
"""
return "".join([item for item in fasta.splitlines() if not item.startswith(">")])
def compare_sequence(target1, target2, line_length=60):
"""
Compare sequence and print difference with "X"
"""
common_length = min(len(target1), len(target2))
rows = ceil(common_length / line_length)
difference = ''.join(["." if target1[i] == target2[i] else 'X' for i in range(common_length)])
for i in range(rows):
def print_line(sequence):
print(f"{i * line_length + 1:0=4} {sequence[i * line_length:(i + 1) * line_length]}")
print_line(target1)
print_line(difference)
print_line(target2)
print()
def apply_snp(sequence, position, ref, alt):
"""
Return modified sequence at specified position in HGVS.
"""
if sequence[position - 1] != ref:
raise ValueError("Base Sequence Mismatch at the specified position.")
return sequence[0:position - 1] + alt + sequence[position:]
def apply_del(sequence, position1, position2):
"""
Return modified sequence at specified position in HGVS.
"""
return sequence[0:position1 - 1] + sequence[position2:]
def apply_ins(sequence, position, insertion):
"""
Return modified sequence at specified position in HGVS.
"""
return sequence[0:position] + insertion + sequence[position:]
Let's use this to compare the cds sequence with mutations.
#!/usr/bin/env python
import cds2protein as c2p
# A1BG
fasta_data = '''
>CCDS12976.1|Hs109|chr19
ATGTCCATGCTCGTGGTCTTTCTCTTGCTGTGGGGTGTCACCTGGGGCCCAGTGACAGAA
GCAGCCATATTTTATGAGACGCAGCCCAGCCTGTGGGCAGAGTCCGAATCACTGCTGAAA
CCCTTGGCCAATGTGACGCTGACGTGCCAGGCCCACCTGGAGACTCCAGACTTCCAGCTG
TTCAAGAATGGGGTGGCCCAGGAGCCTGTGCACCTTGACTCACCTGCCATCAAGCACCAG
TTCCTGCTGACGGGTGACACCCAGGGCCGCTACCGCTGCCGCTCGGGCTTGTCCACAGGA
TGGACCCAGCTGAGCAAGCTCCTGGAGCTGACAGGGCCAAAGTCCTTGCCTGCTCCCTGG
CTCTCGATGGCGCCAGTGTCCTGGATCACCCCCGGCCTGAAAACAACAGCAGTGTGCCGA
GGTGTGCTGCGGGGTGTGACTTTTCTGCTGAGGCGGGAGGGCGACCATGAGTTTCTGGAG
GTGCCTGAGGCCCAGGAGGATGTGGAGGCCACCTTTCCAGTCCATCAGCCTGGCAACTAC
AGCTGCAGCTACCGGACCGATGGGGAAGGCGCCCTCTCTGAGCCCAGCGCTACTGTGACC
ATTGAGGAGCTCGCTGCACCACCACCGCCTGTGCTGATGCACCATGGAGAGTCCTCCCAG
GTCCTGCACCCTGGCAACAAGGTGACCCTCACCTGCGTGGCTCCCCTGAGTGGAGTGGAC
TTCCAGCTACGGCGCGGGGAGAAAGAGCTGCTGGTACCCAGGAGCAGCACCAGCCCAGAT
CGCATCTTCTTTCACCTGAACGCGGTGGCCCTGGGGGATGGAGGTCACTACACCTGCCGC
TACCGGCTGCATGACAACCAAAACGGCTGGTCCGGGGACAGCGCGCCGGTCGAGCTGATT
CTGAGCGATGAGACGCTGCCCGCGCCGGAGTTCTCCCCGGAGCCGGAGTCCGGCAGGGCC
TTGCGGCTGCGGTGCCTGGCGCCCCTGGAGGGCGCGCGCTTCGCCCTGGTGCGCGAGGAC
AGGGGCGGGCGCCGCGTGCACCGTTTCCAGAGCCCCGCTGGGACCGAGGCGCTCTTCGAG
CTGCACAACATTTCCGTGGCTGACTCCGCCAACTACAGCTGCGTCTACGTGGACCTGAAG
CCGCCTTTCGGGGGCTCCGCGCCCAGCGAGCGCTTGGAGCTGCACGTGGACGGACCCCCT
CCCAGGCCTCAGCTCCGGGCGACGTGGAGTGGGGCGGTCCTGGCGGGCCGAGATGCCGTC
CTGCGCTGCGAGGGACCCATCCCCGACGTCACCTTCGAGCTGCTGCGCGAGGGCGAGACG
AAGGCCGTGAAGACGGTCCGCACCCCCGGGGCCGCGGCGAACCTCGAGCTGATCTTCGTG
GGGCCCCAGCACGCCGGCAACTACAGGTGCCGCTACCGCTCCTGGGTGCCCCACACCTTC
GAATCGGAGCTCAGCGACCCTGTGGAGCTCCTGGTGGCAGAAAGCTGA
'''
seq2 = c2p.retrieve_base_sequence_from_fasta(fasta_data)
aa_before = c2p.translate_cds(seq2)
# Snp
print()
print("NM_130786.3:c.1481A>T, NM_130786.4:c.1481A>T, NP_570602.2:p.Glu494Val")
seq3 = c2p.apply_snp(seq2, 1481, 'A', 'T')
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
# Deletion
# rs770249611
print()
print("NM_130786.3:c.1375_1377del, NM_130786.4:c.1375_1377del, NP_570602.2:p.Phe459del")
seq3 = c2p.apply_del(seq2, 1375, 1377)
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
# rs149623806
print()
print("NM_130786.3:c.66_68del, NM_130786.4:c.66_68del, NP_570602.2:p.Ile23del")
seq3 = c2p.apply_del(seq2, 66, 68)
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
# rs1454635471
print(
"NM_130786.3:c.22_23CT[1], NM_130786.4:c.22_23CT[1], NR_015380.2:n.1111_1112GA[1], NR_015380.1:n.1111_1112GA[1], NP_570602.2:p.Leu9fs")
seq3 = c2p.apply_del(seq2, 22, 23)
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
# rs1290624409
print(
"NM_130786.4:c.47_56del, NP_570602.2:p.Gly16fs")
seq3 = c2p.apply_del(seq2, 47, 56)
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
# insertion
# rs1568555027
print(
"NM_130786.4:c.121_122insA, NP_570602.2:p.Pro41fs")
seq3 = c2p.apply_ins(seq2, 121, "A")
aa_after = c2p.translate_cds(seq3)
c2p.compare_sequence(aa_before, aa_after)
NM_130786.3:c.1481A>T, NM_130786.4:c.1481A>T, NP_570602.2:p.Glu494Val
NM_130786.3:c.1481A>T, NM_130786.4:c.1481A>T, NP_570602.2:p.Glu494Val
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ............................................................
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 ............................................................
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 ............................................................
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 ............................................................
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 ............................................................
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 ............................................................
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 ............................................................
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 ............................................................
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0481 ESELSDPVELLVAES*
0481 .............X..
0481 ESELSDPVELLVAVS*
NM_130786.3:c.1375_1377del, NM_130786.4:c.1375_1377del, NP_570602.2:p.Phe459del
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ............................................................
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 ............................................................
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 ............................................................
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 ............................................................
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 ............................................................
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 ............................................................
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 ............................................................
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 ......................................XXXXXXXXXXXXXXXXXXXXXX
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIVGPQHAGNYRCRYRSWVPHTFE
0481 ESELSDPVELLVAES*
0481 XXXXXXXXX.XXXXX
0481 SELSDPVELLVAES*
NM_130786.3:c.66_68del, NM_130786.4:c.66_68del, NP_570602.2:p.Ile23del
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ......................XXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXX
0001 MSMLVVFLLLWGVTWGPVTEAAFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQLF
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 XXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXX
0061 KNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPWL
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 XXXXXXXXXXXXXX.XXXXXXXXXXXXX.X.XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0121 SMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNYS
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 XXXXXXXXXXXXXXXXXXXXX.XX.X...XXXX.XXX.XXXXXXXXXXXXXXXXXXXXXX
0181 CSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDF
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 XXX.XXXXX.XXXX.XXXXXXX.XXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXX
0241 QLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELIL
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.X.XXXXXXXXXXXXXXXX
0301 SDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFEL
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 XXXXXXXXXXXXXXXXXXXX.XX.XXXXXXXXXXXXXX..XXXXXXXXXXXXXXXXXXXX
0361 HNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAVL
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 XXXXXXXXXXXXX.XXXXXXXXXXXXXXXX..XXXXXXXXXXXXXXXXXXXXXXXXXXXX
0421 RCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTFE
0481 ESELSDPVELLVAES*
0481 XXXXXXXXX.XXXXX
0481 SELSDPVELLVAES*
NM_130786.3:c.22_23CT[1], NM_130786.4:c.22_23CT[1], NR_015380.2:n.1111_1112GA[1], NR_015380.1:n.1111_1112GA[1], NP_570602.2:p.Leu9fs
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ........XXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0001 MSMLVVFLAVGCHLGPSDRSSHIL*DAAQPVGRVRITAETLGQCDADVPGPPGDSRLPAV
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0061 QEWGGPGACAP*LTCHQAPVPADG*HPGPLPLPLGLVHRMDPAEQAPGADRAKVLACSLA
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 .XX.X.XXXXXXXXXXX.X.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0121 LDGASVLDHPRPENNSSVPRCAAGCDFSAEAGGRP*VSGGA*GPGGCGGHLSSPSAWQLQ
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 XXXXXXX.XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXX
0181 LQLPDRWGRRPL*AQRYCDH*GARCTTTACADAPWRVLPGPAPWQQGDPHLRGSPEWSGL
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 XXXX...XXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXX.XX.XXXXX
0241 PATARGERAAGTQEQHQPRSHLLSPERGGPGGWRSLHLPLPAA*QPKRLVRGQRAGRADS
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 XXXXXXX.XXXXXXXXXXXXXXXXXX..XXX.XXXXX.XXXXXX.XXXXXXXXXXX..XX
0301 ER*DAARAGVLPGAGVRQGLAAAVPGAPGGRALRPGARGQGRAPRAPFPEPRWDRGALRA
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 XXXXXXXXXXXXXXXXXXXXXXX.XX.XXXXXXXXXXX.XXXXXXXXXXX.XXXXX.XXX
0361 AQHFRG*LRQLQLRLRGPEAAFRGLRAQRALGAARGRTPSQASAPGDVEWGGPGGPRCRP
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 XXXXXXXXXXXXXXX.XXXXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXXXXX..XX
0421 ALRGTHPRRHLRAAARGRDEGREDGPHPRGRGEPRADLRGAPARRQLQVPLPLLGAPHLR
0481 ESELSDPVELLVAES*
0481 XXXXXXXXXXXXXXX
0481 IGAQRPCGAPGGRKL
NM_130786.4:c.47_56del, NP_570602.2:p.Gly16fs
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ...............XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0001 MSMLVVFLLLWGVTWEKQPYFMRRSPACGQSPNHC*NPWPM*R*RARPTWRLQTSSCSRM
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 XXXXXXXXXX..X.XXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXX.
0061 GWPRSLCTLTHLPSSTSSC*RVTPRAATAAARACPQDGPS*ASSWS*QGQSPCLLPGSRW
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 XXXXXXXXXXXXXXXXX..X..XXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXX
0121 RQCPGSPPA*KQQQCAEVCCGV*LFC*GGRATMSFWRCLRPRRMWRPPFQSISLATTAAA
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXX
0181 TGPMGKAPSLSPALL*PLRSSLHHHRLC*CTMESPPRSCTLATR*PSPAWLP*VEWTSSY
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 XXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXX.XXXX.XXXXXX
0241 GAGRKSCWYPGAAPAQIASSFT*TRWPWGMEVTTPAATGCMTTKTAGPGTARRSS*F*AM
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 XXXXXXXX.XXXXXXXXXXXXXX.XXXXXXXXXX.XXXXXXXXXXXXXXXX.XXXXXXXX
0301 RRCPRRSSPRSRSPAGPCGCGAWRPWRARASPWCARTGAGAACTVSRAPLGPRRSSSCTT
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 XXXXXXXXX.XXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXX.X
0361 FPWLTPPTTAASTWT*SRLSGAPRPASAWSCTWTDPLPGLSSGRRGVGRSWRAEMPSCAA
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 XXXXXXX.XXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXX.XXX.XXXXXXXXXXXXXX
0421 RDPSPTSPSSCCARARRRP*RRSAPPGPRRTSS*SSWGPSTPATTGAATAPGCPTPSNRS
0481 ESELSDPVELLVAES*
0481 XXX.XXXXXXXX
0481 SATLWSSWWQKA
NM_130786.4:c.121_122insA, NP_570602.2:p.Pro41fs
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQL
0001 ........................................X.XXXXXXXXXXXXXXXXXX
0001 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKHLGQCDADVPGPPGDSRLPA
0061 FKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPW
0061 XXXXXXXXXXXXXXXXXXXXXXXX.XXX.XXXXXX..XXXXXXXXXXXXXXXX.X.XXXX
0061 VQEWGGPGACAP*LTCHQAPVPADG*HPGPLPLPLGLVHRMDPAEQAPGADRAKVLACSL
0121 LSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNY
0121 XXXXXXXXXX.XXXXXXXXXXXXX.XX.XXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXX
0121 ALDGASVLDHPRPENNSSVPRCAAGCDFSAEAGGRP*VSGGA*GPGGCGGHLSSPSAWQL
0181 SCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVD
0181 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXX
0181 QLQLPDRWGRRPL*AQRYCDH*GARCTTTACADAPWRVLPGPAPWQQGDPHLRGSPEWSG
0241 FQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
0241 XXXXXXXXXXXXXXXXXX.XXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXX
0241 LPATARGERAAGTQEQHQPRSHLLSPERGGPGGWRSLHLPLPAA*QPKRLVRGQRAGRAD
0301 LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFE
0301 XXXXXXXXXXXX.XXXXXXX.XXXXXXXXX.XXXXXXXXXX.XXXXXX.XX.XXXXXXXX
0301 SER*DAARAGVLPGAGVRQGLAAAVPGAPGGRALRPGARGQGRAPRAPFPEPRWDRGALR
0361 LHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAV
0361 XXXXXXXXXXXXXXXXXXXXXX.X.XXXXXX.XXXXXXX.XXXXXXXXXXXXXXX.XXXX
0361 AAQHFRG*LRQLQLRLRGPEAAFRGLRAQRALGAARGRTPSQASAPGDVEWGGPGGPRCR
0421 LRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTF
0421 XXXX.XX.XXXXXXXXX.XXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXXXXXXXXXX
0421 PALRGTHPRRHLRAAARGRDEGREDGPHPRGRGEPRADLRGAPARRQLQVPLPLLGAPHL
0481 ESELSDPVELLVAES*
0481 XXXXXX.XXXXXXXXX
0481 RIGAQRPCGAPGGRKL
The amino acid sequence inferred when a mutation is added to cds is just a reference value. It can be used only when the following conditions are met at least
--Mutations are confined to a single exon in cds --Mutations generated in exon do not affect the boundary between exon and intron
I don't know under what conditions the above holds ...
So far this time: smile:
Recommended Posts