Following on from Conditional branching of Python learned by chemoinformatics, we will explain "iteration processing" with the theme of lipidomics (comprehensive analysis of lipids). We will mainly explain practical examples of chemoinformatics, so if you want to check the basics, please read the following article before reading this article.
Pharmaceutical researcher summarized Python control statements
for
can be used to iterate a predetermined number of times.
As for variable in iterable (list etc.):
, describe the processing content by indenting 4 half-width spaces with line breaks.
carbon_numbers = [16, 18, 20, 22, 24]
for Cn in carbon_numbers:
print(Cn, 0, sep=':')
In the above example, carbon numbers
is used as a list of the number of carbon atoms (chain length) of fatty acids, and the abbreviations of saturated fatty acids corresponding to each are created.
range
You can also use range
as an iterable.
range
is an image like a continuous list of integers in a specified range.
Cn = 22
for Un in range(4, 7):
print(Cn, Un, sep=':')
for Un in range(7):
print(Cn, Un, sep=':')
In the example above, range (4, 7)
indicates a contiguous integer (4, 5, 6) from 4 to 6.
Also, writing range (7)
means consecutive integers from 0 to 6 (0, 1, 2, 3, 4, 5, 6).
Here, a numerical value indicating the degree of unsaturation of fatty acid (the number of double bonds in the carbon chain) is generated.
enumerate
You can retrieve the iterable index number and element as a set by using ʻenumerate` as shown below.
fatty_acids = ['FA 16:0', 'FA 18:0', 'FA 18:1']
for fatty_acid in fatty_acids:
print(fatty_acid)
for i, fatty_acid in enumerate(fatty_acids):
print(f'{i}: {fatty_acid}')
In the above example, the index number of the list fatty_acids
is stored in ʻi, the values of the elements of the list
fatty_acidsare stored in the
fatty_acid in order, and it is output by
print`. ..
You can also iterate over a dictionary using for
.
An example is shown below.
fatty_acids = {'Palmitic acid': 'FA 16:0', 'Stearic acid': 'FA 18:0', 'Oleic acid': 'FA 18:1'} #Fatty acid name (common name) and abbreviation
for key in fatty_acids.keys(): #Dictionary key
print(key)
for value in fatty_acids.values(): #Dictionary value
print(value)
for key, value in fatty_acids.items(): #Dictionary keys and values
print(f'{key} is {value}')
exact_mass = {'C': 12, 'H': 1.00783, 'O': 15.99491} #Element symbol and atomic weight
formula_pa = {'C': 16, 'H': 32, 'O': 2} #Element symbol and number of atoms (composition formula)
em_pa = 0
for key, value in formula_pa.items():
em_pa += exact_mass[key] * value #Multiply the atomic weight and the number of atoms
print(em_pa)
fatty_acids = {16: [0, 1], 18: [0, 1, 2, 3, 4], 20: [0, 3, 4, 5], 22: [0, 4, 5, 6]} #A dictionary with the number of carbon atoms of fatty acid as the key and the number of double bonds as the value
for Cn, Uns in fatty_acids.items():
for Un in Uns:
print(Cn, Un, sep=':')
Here, ʻa = a + n can be written as ʻa + = n
. The part ʻem_pa + = exact_mass [key] * value means that each atom is multiplied by the atomic weight and the number of atoms and added to the variable ʻem_pa
that calculates the precise mass.
Of course, it can also be combined with ʻif`.
fatty_acids = ['FA 16:0', 'FA 18:0', 'FA 18:1', 'FA 18:2', 'FA 20:4', 'FA 22:6', 'FA 24:0']
saturated_fatty_acids = [] #Empty list (enter values in subsequent processing)
unsaturated_fatty_acids = [] #Empty list (enter values in subsequent processing)
for fatty_acid in fatty_acids:
if fatty_acid[-1] == '0':
saturated_fatty_acids.append(fatty_acid) #Saturated fatty acids
else:
unsaturated_fatty_acids.append(fatty_acid) #Unsaturated fatty acids
print(saturated_fatty_acids)
print(unsaturated_fatty_acids)
Use break
as shown below to break the iterative process.
ʻElseis used to describe what happens when iterative processing ends without
break. It doesn't matter if you don't have ʻelse
.
fatty_acids = ['FA 16:0', 'FA 18:0', '', 'FA 18:1']
for fatty_acid in fatty_acids:
if fatty_acid == '':
print('It's empty. Cancels processing.')
break
print(fatty_acid) # 「FA 18:Outputs up to "0"
else:
print('The process is complete.') #Not output here
In the above example, the third element from the left in the list fatty_acids
is empty, so iterates up to the previous element and outputs the element's value.
In this example, else
and below are not executed because the process is interrupted by break
.
On the other hand, using continue
skips the iteration.
fatty_acids = ['FA 16:0', 'FA 18:0', '', 'FA 18:1']
for fatty_acid in fatty_acids:
if fatty_acid == '':
print('It's empty. Skip')
continue
print(fatty_acid) #Skip the empty elements and say "FA 18:Outputs up to 1 "
In the above example, blank elements are skipped by continue
, so the element value is not output, but it is processed to the end and" FA 18: 1 "is also output.
If ʻelse` is entered, the following will also be executed.
Here, as an application, we will consider finding the number of carbon atoms and the number of carbon chain double bonds from a character string written in SMILES notation.
smiles_fa = 'OC(CCCCCCC/C=C\C/C=C\CCCCC)=O'
Cn = 0
Un = 0
for i in range(len(smiles_fa)):
if smiles_fa[i] == 'C':
Cn += 1
elif smiles_fa[i] == '=' and smiles_fa[i+1] == 'C':
Un += 1
print(Cn, Un, sep=':')
Looking at the character string from the left, if it is C
, increase the variable Cn
that counts the number of carbon atoms by 1, and if it is =
and the next character is C
(there is also carbonyl carbon). Because), the variable ʻUn` that counts the number of double bonds is increased by 1.
If you use while
, iterates as long as the specified conditions are met.
Describe the process by starting a new line as while condition:
and indenting 4 half-width spaces.
saturated_fatty_acids = ['FA 16:0', 'FA 18:0']
unsaturated_fatty_acids = ['FA 18:1', 'FA 18:2', 'FA 18:3', 'FA 20:4', 'FA 22:6']
fatty_acids = []
while len(saturated_fatty_acids) > 0:
fatty_acids.append(saturated_fatty_acids[-1])
saturated_fatty_acids.pop()
while len(unsatturated_fatty_acids) > 0:
fatty_acids.append(unsaturated_fatty_acids[-1])
unsaturated_fatty_acids.pop()
print(fatty_acids)
In the above example, the elements are taken from the back of the original lists saturated_fatty_acids
and ʻunsaturated_fatty_acids and moved to the new empty list
fatty_acids. Here,
len (list)> 0refers to the case where the list contains some elements. After moving the element to
fatty_acids, I try to remove the element from the original list with
pop`. Note that if you forget to delete this, the process will be repeated infinitely (infinite loop).
Finally, as an application, let's consider counting the number of carbon atoms using while
in SMILES notation.
smiles_fa = 'OC(CCCCCCCCCCCCCCC)=O'
Cn = 0
while 'C' in smiles_fa:
if smiles_fa[0] == 'C':
Cn += 1
smiles_fa = smiles_fa[1:]
print(Cn)
If the SMILES string contains the character C
and the leftmost character is C
, increase the variable Cn
, which counts the number of carbon atoms, by 1.
The leftmost character is erased every time, regardless of whether it is C
or not.
This will allow you to count the number of C
s in SMILES.
As mentioned above, for
is used when the number of repetitions is fixed, and while
is used when the number of repetitions is not fixed.
Here, we have explained Python iterations, focusing on practical knowledge that can be used in chemoinformatics. Let's review the main points again.
--for
can be used when the number of repetitions is fixed. You can also use break
or continue
to stop or skip processing under certain conditions.
--while
has no fixed number of repetitions and can be used to repeat the process until the preset conditions are met. Care must be taken not to create an "infinite loop".
Next, the following article explains Python functions.
Python functions learned from chemoinformatics
Surprisingly few! ?? "Minimum" knowledge required for programming in a pharmaceutical company
Recommended Posts