I created a python script to convert a MOL file to SMILES. This is ubiquitous, but I read a MOL file (MOL format string) from standard input and spit out SMILES to standard output. As a result, the result can be obtained without going through the file, and it can be used more easily.
The source is as follows.
Mol2SMILESConvertor.py
import sys
from rdkit import Chem
def main():
f = sys.stdin
mol_block = ""
for line in f:
mol_block += line
mol = Chem.MolFromMolBlock(mol_block)
smiles = Chem.MolToSmiles(mol)
print(smiles)
if __name__ == "__main__":
main()
The usage is explained using the following MOL file as an example.
test.mol
2,3,6-PCB
RDKit 2D
15 16 0 0 0 0 0 0 0 0999 V2000
1.2990 -0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 1.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2990 -0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 -1.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 -3.0008 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.2978 -3.7529 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3380 -3.1546 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
-1.2955 -5.2529 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.3337 -5.8546 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.0048 -6.0009 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3026 -5.2488 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.3002 -3.7488 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2.3385 -3.1472 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
2 3 1 0
3 4 2 0
4 5 1 0
5 6 2 0
6 1 1 0
6 7 1 0
7 8 2 0
8 9 1 0
8 10 1 0
10 11 1 0
10 12 2 0
12 13 1 0
13 14 2 0
14 7 1 0
14 15 1 0
M END
$$$$
If you cat the above MOL file and feed it to this script, you will get SMILES as follows. It's easy.
$ cat test.mol |python bin/Mol2SMILESConvertor.py
Clc1ccc(Cl)c(-c2ccccc2)c1Cl
--It would be even more convenient to read an SDF file (SDF format character string) containing multiple compounds and output multiple SMILES at once.
Recommended Posts