I will explain how to use map and lambda in python using fastq file.
The following test file is a file of the analysis result of the DNA sequencer called fastq file, which is familiar in bioinformatics. The @ line is the header, the next line is the DNA base sequence, the 3rd line is the 4th line with + in between. Is the quality evaluation value for each character of the DNA base sequence on the second line, and the value obtained by adding 33 to the quality evaluation value is the number converted with ASCII characters.
test.fastq
@test1
GAGCACACGTCTNNANNCNAGTCANNANNNANNNNNNNNNNANNCNNNNNNTNNNNNNNNANNNNTGTCCATTGCNNNCACATCATTGTTTACTTGCGCNT
+
;<<:?@9<?############################################################################################
I want to correct the quality evaluation value to the original value. So I tried to write it in python, but I came across a very convenient combination of map and lambda, so I will make a note of it. By the way, the environment is python2. Note: Differences in how higher-order functions are used between python versions.
For example, to convert a quality evaluation value of A to a number, python uses a built-in function called ** ord (opposite chr) ** to convert the ASCII code to a number, then subtract 33 to get the original Will be the value of.
> python -c 'print ord("A")-33'
32
To convert this to all 101 characters on the quality value line of the test file, use the for statement.
convert_asci.py
asci_string = ";<<:?@9<?############################################################################################"
for baseq in asci_string:
score = ord(baseq) - 33
print score,
Can be written as.
convert_asci.py execution result
26 27 27 25 30 31 24 27 30 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
The larger this value is, the better the quality is, so you can see that the base of the quality value of "#" is very poor quality. By the way, this ASCI code conversion program uses a for statement, and it is difficult for the code to become vertically long as the program gets longer, and the execution speed seems to be slow. So, let's express it using map.
py:convert_asci.2.py
asci_string = ";<<:?@9<?############################################################################################"
def convert_func(x):
score = ord(x) - 33
return score
res_score = map(convert_func, asci_string)
print res_score
text:convert_asci.2.py execution result
[26, 27, 27, 25, 30, 31, 24, 27, 30, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
Even if you avoid the for statement in this way, the statement on the line that defines the function will be long. So I learned that you can use an anonymous function ** called ** lambda to write a process equivalent to the "convert_func" function in a map (did you ever know!). It will be as follows.
py:convert_asci.3.py modified script
asci_string = ";<<:?@9<?############################################################################################"
res_score = map(lambda x:ord(x) - 33, asci_string)
print res_score
** The character string is internally divided into one character and iterated in the for statement and map. I received the information. Thank you, I have corrected it. ** **
py:convert_asci.3.Script before py modification
asci_string = ";<<:?@9<?############################################################################################"
asci_list = list(asci_string) #There was no need to do this (listing).
res_score = map(lambda x:ord(x) - 33, asci_list)
print res_score
How is it? I called it in one line. The result is the same as map returns a list. Anonymous functions are disposable functions that are used only once. It's called an anonymous function because it's only used once and doesn't need to be named. The format for creating an anonymous function using a lambda expression is as follows.
lambda argument(In the example x):Return value(In the example, ord(x) - 33)
In the example, x receives the value of asci_list from map as an argument one by one, executes the specified process, and then returns the return value. This is very convenient!
Recommended Posts