Introduction

Following on from Conditional branching of Python learned by chemoinformatics, we will explain "iteration processing" with the theme of lipidomics (comprehensive analysis of lipids). We will mainly explain practical examples of chemoinformatics, so if you want to check the basics, please read the following article before reading this article.

Pharmaceutical researcher summarized Python control statements

for statement

for can be used to iterate a predetermined number of times. As for variable in iterable (list etc.):, describe the processing content by indenting 4 half-width spaces with line breaks.

carbon_numbers = [16, 18, 20, 22, 24]

for Cn in carbon_numbers:
    print(Cn, 0, sep=':')

In the above example, carbon numbers is used as a list of the number of carbon atoms (chain length) of fatty acids, and the abbreviations of saturated fatty acids corresponding to each are created.

range

You can also use range as an iterable. range is an image like a continuous list of integers in a specified range.

Cn = 22

for Un in range(4, 7):
    print(Cn, Un, sep=':')
    
for Un in range(7):
    print(Cn, Un, sep=':')

In the example above, range (4, 7) indicates a contiguous integer (4, 5, 6) from 4 to 6. Also, writing range (7) means consecutive integers from 0 to 6 (0, 1, 2, 3, 4, 5, 6). Here, a numerical value indicating the degree of unsaturation of fatty acid (the number of double bonds in the carbon chain) is generated.

enumerate

You can retrieve the iterable index number and element as a set by using ʻenumerate` as shown below.

fatty_acids = ['FA 16:0', 'FA 18:0', 'FA 18:1']

for fatty_acid in fatty_acids:
    print(fatty_acid)

for i, fatty_acid in enumerate(fatty_acids):
    print(f'{i}: {fatty_acid}')

In the above example, the index number of the list fatty_acids is stored in ʻi, the values of the elements of the list fatty_acidsare stored in thefatty_acid in order, and it is output by print`. ..

dictionary

You can also iterate over a dictionary using for. An example is shown below.

fatty_acids = {'Palmitic acid': 'FA 16:0', 'Stearic acid': 'FA 18:0', 'Oleic acid': 'FA 18:1'} #Fatty acid name (common name) and abbreviation

for key in fatty_acids.keys(): #Dictionary key
    print(key)
    
for value in fatty_acids.values(): #Dictionary value
    print(value)
    
for key, value in fatty_acids.items(): #Dictionary keys and values
    print(f'{key} is {value}')

    
exact_mass = {'C': 12, 'H': 1.00783, 'O': 15.99491} #Element symbol and atomic weight
formula_pa = {'C': 16, 'H': 32, 'O': 2} #Element symbol and number of atoms (composition formula)
em_pa = 0

for key, value in formula_pa.items():
    em_pa += exact_mass[key] * value #Multiply the atomic weight and the number of atoms
print(em_pa)


fatty_acids = {16: [0, 1], 18: [0, 1, 2, 3, 4], 20: [0, 3, 4, 5], 22: [0, 4, 5, 6]} #A dictionary with the number of carbon atoms of fatty acid as the key and the number of double bonds as the value

for Cn, Uns in fatty_acids.items():
    for Un in Uns:
        print(Cn, Un, sep=':')

Here, ʻa = a + n can be written as ʻa + = n. The part ʻem_pa + = exact_mass [key] * value means that each atom is multiplied by the atomic weight and the number of atoms and added to the variable ʻem_pa that calculates the precise mass.

Combination with conditional branching

Of course, it can also be combined with ʻif`.

fatty_acids = ['FA 16:0', 'FA 18:0', 'FA 18:1', 'FA 18:2', 'FA 20:4', 'FA 22:6', 'FA 24:0']

saturated_fatty_acids = [] #Empty list (enter values in subsequent processing)
unsaturated_fatty_acids = [] #Empty list (enter values in subsequent processing)

for fatty_acid in fatty_acids:
    if fatty_acid[-1] == '0':
        saturated_fatty_acids.append(fatty_acid) #Saturated fatty acids
    else:
        unsaturated_fatty_acids.append(fatty_acid) #Unsaturated fatty acids
        
print(saturated_fatty_acids)
print(unsaturated_fatty_acids)

break and continue

Use break as shown below to break the iterative process. ʻElseis used to describe what happens when iterative processing ends withoutbreak. It doesn't matter if you don't have ʻelse.

fatty_acids = ['FA 16:0', 'FA 18:0', '', 'FA 18:1']

for fatty_acid in fatty_acids:
    if fatty_acid == '':
        print('It's empty. Cancels processing.')
        break
    print(fatty_acid) # 「FA 18:Outputs up to "0"
else:
        print('The process is complete.') #Not output here

In the above example, the third element from the left in the list fatty_acids is empty, so iterates up to the previous element and outputs the element's value. In this example, else and below are not executed because the process is interrupted by break.

On the other hand, using continue skips the iteration.

fatty_acids = ['FA 16:0', 'FA 18:0', '', 'FA 18:1']

for fatty_acid in fatty_acids:
    if fatty_acid == '':
        print('It's empty. Skip')
        continue
    print(fatty_acid) #Skip the empty elements and say "FA 18:Outputs up to 1 "

In the above example, blank elements are skipped by continue, so the element value is not output, but it is processed to the end and" FA 18: 1 "is also output. If ʻelse` is entered, the following will also be executed.

Application: SMILES notation

Here, as an application, we will consider finding the number of carbon atoms and the number of carbon chain double bonds from a character string written in SMILES notation.

smiles_fa = 'OC(CCCCCCC/C=C\C/C=C\CCCCC)=O'

Cn = 0
Un = 0

for i in range(len(smiles_fa)):
    if smiles_fa[i] == 'C':
        Cn += 1
    elif smiles_fa[i] == '=' and smiles_fa[i+1] == 'C':
        Un += 1

print(Cn, Un, sep=':')

Looking at the character string from the left, if it is C, increase the variable Cn that counts the number of carbon atoms by 1, and if it is = and the next character is C (there is also carbonyl carbon). Because), the variable ʻUn` that counts the number of double bonds is increased by 1.

while statement

If you use while, iterates as long as the specified conditions are met. Describe the process by starting a new line as while condition: and indenting 4 half-width spaces.

saturated_fatty_acids = ['FA 16:0', 'FA 18:0']
unsaturated_fatty_acids = ['FA 18:1', 'FA 18:2', 'FA 18:3', 'FA 20:4', 'FA 22:6']

fatty_acids = []

while len(saturated_fatty_acids) > 0:
    fatty_acids.append(saturated_fatty_acids[-1])
    saturated_fatty_acids.pop()

while len(unsatturated_fatty_acids) > 0:
    fatty_acids.append(unsaturated_fatty_acids[-1])
    unsaturated_fatty_acids.pop()

print(fatty_acids)

In the above example, the elements are taken from the back of the original lists saturated_fatty_acids and ʻunsaturated_fatty_acids and moved to the new empty list fatty_acids. Here, len (list)> 0refers to the case where the list contains some elements. After moving the element tofatty_acids, I try to remove the element from the original list with pop`. Note that if you forget to delete this, the process will be repeated infinitely (infinite loop).

Application: SMILES notation

Finally, as an application, let's consider counting the number of carbon atoms using while in SMILES notation.

smiles_fa = 'OC(CCCCCCCCCCCCCCC)=O'

Cn = 0

while 'C' in smiles_fa:
    if smiles_fa[0] == 'C':
        Cn += 1
    smiles_fa = smiles_fa[1:]
print(Cn)

If the SMILES string contains the character C and the leftmost character is C, increase the variable Cn, which counts the number of carbon atoms, by 1. The leftmost character is erased every time, regardless of whether it is C or not. This will allow you to count the number of Cs in SMILES.

As mentioned above, for is used when the number of repetitions is fixed, and while is used when the number of repetitions is not fixed.

Summary

Here, we have explained Python iterations, focusing on practical knowledge that can be used in chemoinformatics. Let's review the main points again.

--for can be used when the number of repetitions is fixed. You can also use break or continue to stop or skip processing under certain conditions. --while has no fixed number of repetitions and can be used to repeat the process until the preset conditions are met. Care must be taken not to create an "infinite loop".

Next, the following article explains Python functions.

Python functions learned from chemoinformatics

Reference materials / links

Surprisingly few! ?? "Minimum" knowledge required for programming in a pharmaceutical company

Python Iteration Learning with Cheminformatics

Introduction

for statement

dictionary

Combination with conditional branching

break and continue

Application: SMILES notation

while statement

Application: SMILES notation

Summary

Reference materials / links