If it is necessary to retain the properties of each atom instead of the entire molecule, it is not possible in the SMILES format, so it is embedded in the SDF and stored. However, since SDF basically defines records in molecular units, embedding atomic properties has to be a brute force method, such as including the atomic index in the property name.
I searched variously to see if there was a good way to store the properties of atoms, and found that the functions of RDKit described in https://www.rdkit.org/docs/RDKit_Book.html#atom-properties-and-sdf-files. I found that it can be done smoothly by using, so make a note of it.
SDMolSupplier since RDKit 2019.03.1 recognizes some molecular properties as an atomic property list and can convert them to atomic properties.
That is, properties whose names start with atom.prop, atom.iprop, atom.dprop, or atom.bprop are converted to string, int, double, or bool atomic properties, respectively.
The details are as per the URL, but seeing is believing. Let's try it.
The following are the "PartialSigmaCharge", "PiElectronegativity", "AtomHybridizationVSEPR", "AtomDegree", "AtomHybridization", "EffectiveAtomPolarizability", "SigmaElectronegativity", "AtomValance", "StabilizationPlusCharge" and their atomic values calculated by CDK. It is a python program that embeds in SDF as an atomic property list, saves and loads it, and loads it as an atomic property with the GetDoubleProp method of the Atom class (CDK program is omitted). There were also Double type and int type properties this time, but since it was troublesome, I saved them all as Dobule type (atom.dprop).
from rdkit import Chem
from rdkit.Chem import rdmolfiles
from rdkit.Chem import rdmolops
from rdkit.Chem.rdchem import Mol
def execute_javacdk_atom_feature(mol, atom_properties):
mol_file = "tmp.mol"
with open(mol_file, "w") as f:
txt = Chem.MolToMolBlock(mol)
f.writelines(txt)
#Run the CDK program
import subprocess
cmd = "java -classpath lib\cdk-2.2.jar;lib AtomicCircleDescriptorCalculator {0} -1".format(mol_file)
proc = subprocess.run(cmd, shell=True)
if proc.returncode != 0:
print("error")
return
import pandas as pd
df = pd.read_csv("result.csv")
df = df[atom_properties]
#While reading the CDK result, set the atom property to each atom with the SetDoubProp function.
for row in df.iterrows():
atom = mol.GetAtomWithIdx(row[0])
for atom_property in atom_properties:
atom.SetDoubleProp(atom_property, row[1][atom_property])
#Save atomic property list as molecular property
for atom_property in atom_properties:
Chem.CreateAtomDoublePropertyList(mol, atom_property)
return mol
def main():
import argparse
from rdkit import Chem
#Atomic properties you want to fit in SDF
atom_properties = ["longestMaxTopDistInMolecule", "highestMaxTopDistInMatrixRow", "diffSPAN3", "relSPAN4", "PartialSigmaCharge", "PiElectronegativity",
"AtomHybridizationVSEPR", "AtomDegree", "AtomHybridization", "EffectiveAtomPolarizability", "SigmaElectronegativity", "AtomValance",
"StabilizationPlusCharge"]
parser = argparse.ArgumentParser()
parser.add_argument("-input", type=str, required=True)
parser.add_argument("-output", type=str, required=True)
args = parser.parse_args()
#For reading sdf
sdf_sup = Chem.SDMolSupplier(args.input)
#For writing sdf
sdf_wtr = Chem.SDWriter(args.output)
#Embed the calculation result of CDK in the atom in the molecule while reading sdf and output
for i, mol in enumerate(sdf_sup):
#Embed the calculation result of CDK in the atom in the molecule
execute_javacdk_atom_feature(mol, atom_properties)
#Output molecules to SDF file
sdf_wtr.write(mol)
sdf_wtr.close()
#Read the output SDF file and try to output the atomic properties
sdf_sup_output = Chem.SDMolSupplier(args.output)
for i, mol in enumerate(sdf_sup_output):
print(mol.GetProp("_Name"))
for atom in mol.GetAtoms():
print(atom.GetIdx())
for atom_property in atom_properties:
print("{0}:{1}".format(atom_property, atom.GetDoubleProp(atom_property)))
if __name__ == "__main__":
main()
Atomic properties can be read like this.
13_cis_retinoic_acid
0
longestMaxTopDistInMolecule:13.0
highestMaxTopDistInMatrixRow:13.0
diffSPAN3:0.0
relSPAN4:1.0
PartialSigmaCharge:-0.14444603089353975
PiElectronegativity:0.0
AtomHybridizationVSEPR:3.0
AtomDegree:1.0
AtomHybridization:3.0
EffectiveAtomPolarizability:4.399859863281246
SigmaElectronegativity:12.344630439100552
AtomValance:6.0
StabilizationPlusCharge:0.0
1
longestMaxTopDistInMolecule:13.0
highestMaxTopDistInMatrixRow:13.0
diffSPAN3:0.0
relSPAN4:1.0
PartialSigmaCharge:-0.2572541465287616
PiElectronegativity:8.434574556331611
AtomHybridizationVSEPR:2.0
AtomDegree:1.0
AtomHybridization:2.0
EffectiveAtomPolarizability:4.007609863281251
SigmaElectronegativity:13.562163443445453
AtomValance:6.0
StabilizationPlusCharge:0.0
How it is saved in SDF is like this. The values for all atoms are stored in the property separated by spaces.
13_cis_retinoic_acid
RDKit 3D
22 22 0 0 1 0 0 0 0 0999 V2000
10.6160 5.4820 -0.2300 O 0 0 0 0 0 0 0 0 0 0 0 0
9.9350 3.3080 -0.3420 O 0 0 0 0 0 0 0 0 0 0 0 0
2.1650 -3.7930 -0.4060 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9360 -5.2870 -0.2160 C 0 0 0 0 0 0 0 0 0 0 0 0
3.1970 -6.0950 -0.4140 C 0 0 0 0 0 0 0 0 0 0 0 0
3.4270 -3.2930 0.2730 C 0 0 0 0 0 0 0 0 0 0 0 0
4.2930 -5.6270 0.5190 C 0 0 0 0 0 0 0 0 0 0 0 0
4.3720 -4.1470 0.7160 C 0 0 0 0 0 0 0 0 0 0 0 0
0.9180 -3.1060 0.1200 C 0 0 0 0 0 0 0 0 0 0 0 0
2.2620 -3.5100 -1.8950 C 0 0 0 0 0 0 0 0 0 0 0 0
3.4900 -1.8530 0.4430 C 0 0 0 0 0 0 0 0 0 0 0 0
5.5940 -3.7230 1.4610 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5660 -1.0700 0.3160 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5800 0.3580 0.4930 C 0 0 0 0 0 0 0 0 0 0 0 0
3.3000 1.0310 0.8560 C 0 0 0 0 0 0 0 0 0 0 0 0
5.7270 1.0260 0.3240 C 0 0 0 0 0 0 0 0 0 0 0 0
5.8760 2.4440 0.4680 C 0 0 0 0 0 0 0 0 0 0 0 0
7.0420 3.0690 0.2850 C 0 0 0 0 0 0 0 0 0 0 0 0
7.2550 4.4870 0.4140 C 0 0 0 0 0 0 0 0 0 0 0 0
6.0830 5.3300 0.7850 C 0 0 0 0 0 0 0 0 0 0 0 0
8.4270 5.0940 0.2220 C 0 0 0 0 0 0 0 0 0 0 0 0
9.7040 4.4870 -0.1440 C 0 0 0 0 0 0 0 0 0 0 0 0
1 22 1 0
2 22 2 0
3 4 1 0
3 6 1 0
3 9 1 0
3 10 1 0
4 5 1 0
5 7 1 0
6 8 2 0
6 11 1 0
7 8 1 0
8 12 1 0
11 13 2 0
13 14 1 0
14 15 1 0
14 16 2 0
16 17 1 0
17 18 2 0
18 19 1 0
19 20 1 0
19 21 2 0
21 22 1 0
M END
> <ID> (1)
13_cis_retinoic_acid
> <atom.dprop.longestMaxTopDistInMolecule> (1)
13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13
> <atom.dprop.highestMaxTopDistInMatrixRow> (1)
13 13 11 12 13 10 12 11 12 12 9 12 8 7 8 7 8 9 10 11 11 12
> <atom.dprop.diffSPAN3> (1)
0 0 2 1 0 3 1 2 1 1 4 1 5 6 5 6 5 4 3 2 2 1
> <atom.dprop.relSPAN4> (1)
1 1 0.84615384615384615 0.92307692307692324 1 0.76923076923076927 0.92307692307692324 0.84615384615384615 0.92307692307692324 0.92307692307692324 0.69230769230769229 0.92307692307692324
0.61538461538461542 0.53846153846153844 0.61538461538461542 0.53846153846153844 0.61538461538461542 0.69230769230769229 0.76923076923076927 0.84615384615384615 0.84615384615384615
0.92307692307692324
> <atom.dprop.PartialSigmaCharge> (1)
-0.14444603089353975 -0.25725414652876161 0.018323058588212145 0.0044130542545423954 0.0047171590351277118 -0.030785271462194001 0.021723335595041502 -0.047931809716993255
0.0044017486353659045 0.0044017486353659045 -0.0053179497186989309 0.026077408846979573 -0.0046644578993843356 -0.021862432534977119 0.031157982917592638 -0.0043178495737188505
-0.00066866837339847867 -0.0041286011669637492 -0.016275053354415626 0.031355763573928032 0.067212842574592244 0.32386816856629769
> <atom.dprop.PiElectronegativity> (1)
0 8.4345745563316115 0 0 0 5.7377113992064643 0 5.6043578182067719 0 0 5.9378835230737863 0 5.943053057762822 5.807558163952943 0 5.9457956219007961 5.9746983272066947 5.9472932607203139
5.8514526889811611 0 6.5217619200580135 8.7517623225150878
> <atom.dprop.AtomHybridizationVSEPR> (1)
3 2 3 3 3 2 3 2 3 3 2 3 2 2 3 2 2 2 2 3 2 2
> <atom.dprop.AtomDegree> (1)
1 1 4 2 2 3 2 3 1 1 2 1 2 3 1 2 2 2 3 1 2 3
> <atom.dprop.AtomHybridization> (1)
3 2 3 3 3 2 3 2 3 3 2 3 2 2 3 2 2 2 2 3 2 2
> <atom.dprop.EffectiveAtomPolarizability> (1)
4.3998598632812458 4.0076098632812514 11.022853027343745 8.7883015136718754 7.9534945068359395 11.401456054687504 8.4101140136718744 9.9484780273437483 7.4461765136718752
7.4461765136718752 9.9611621093749996 7.0679890136718742 9.4839492187500003 9.7312109375000002 6.959355468750001 8.9463281250000009 8.6171093749999947 8.5909453125000006 8.8427539062500067
6.515126953124998 7.6181894531249963 6.1732197265625013
> <atom.dprop.SigmaElectronegativity> (1)
12.344630439100552 13.562163443445453 8.1489517017565554 8.0200518228093163 8.0227611391275193 8.5055780121018785 8.1802713269458405 8.3472735246209044 8.0199577205445411
8.0199577205445411 8.7413311783943062 8.2201852554190094 8.7471753452915539 8.586988358849446 8.2666333630758384 8.7502658294135927 8.7840132211143533 8.7518543450176072 8.6368154001064532
8.268288364628825 9.416522812299613 11.9692325165343
> <atom.dprop.AtomValance> (1)
6 6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
> <atom.dprop.StabilizationPlusCharge> (1)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The elegant.
When doing Graph Convoluation with this, I wanted to embed the atomic property defined by myself in the node attribute, but it seems that it can be implemented quite smartly by using this method.