A piece of chemoinformatics article. It is said that he used Mordred (1.0.0) of the descriptor calculation library and finally found the inf as the calculation result.
Recently, I've been dealing with descriptor calculation results for a while. So, there seems to be a result of "inf". But I have never seen it. I searched for about 1000 lightly, but I couldn't find it. Then, let's open it up again and calculate tens of thousands of cases to find it.
Mordred calculation normally. So, add except every time an error occurs. I felt like it would continue to operate for a long time.
The following environment created from Anaconda on Windows 10 Pro (x64).
# Name Version Build Channel
python 3.6.8 h9f7ef89_7
rdkit 2017.09.2.0 py36he334aed_1 rdkit
mordred 1.0.0 py36_0 mordred-descriptor
I don't think it will affect anything other than the above.
This is the code.
from rdkit import Chem
from mordred import Calculator, descriptors
from mordred import error as err
from datetime import datetime
descs = Calculator(descriptors, ignore_3D=False).descriptors
# ------------------------------------------------------
# functions
# ------------------------------------------------------
# get compounds
def get_mols(file):
return Chem.SDMolSupplier(file)
# write text
def output_text(filename, mode, values):
with open(filename + '.csv', mode) as f:
f.write(','.join(values) + '\n')
# calculation
def calculate_desc(calc, mol):
value = None
try:
value = calc(mol)
except ZeroDivisionError as e:
value = 'errZero'
except IndexError as e:
value = 'errIndex'
except ValueError as e:
value = 'errValue'
except NameError as e:
value = 'errNone'
except err.Missing3DCoordinate as e:
value = 'err3D'
except err.MultipleFragments as e:
value = 'errMulti'
return str(value)
# print log
def printlog(value):
print(str(datetime.now()) + ',' + str(i))
# ------------------------------------------------------
# main
# ------------------------------------------------------
# get compounds
filename = 'CHEMBL503873'
mols = get_mols(filename + '.sdf')
# get calculators
headers = list()
calcs = list()
headers.append('Name')
for i in range(1824):
calcs.append(descs[i])
headers.append(calcs[i].__str__())
# output
output_text(filename, 'w', headers)
printlog(0)
for i, mol in enumerate(mols):
values = list()
if mol is not None:
values.append(mol.GetProp('_Name'))
for calc in calcs:
values.append(calculate_desc(calc, mol))
output_text(filename, 'a', values)
if i % 100 == 0:
printlog(i)
So, I embedded it in the script, but this is the compound I found.
https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL503873/
CHEMBL503873 C70H108O24
CO[C@H]1C[C@H](COC[C@@H]2[C@@H](C)O[C@H](C[C@@H]2OC)O[C@H]3CC[C@]4(C)[C@H]5C[C@@H](OC(=O)\C=C\c6ccccc6)[C@]7(C)[C@@](O)(CC[C@]7(O)[C@]5(O)CC=C4C3)C(=O)C)O[C@@H](C)[C@H]1COC[C@H]8C[C@H](OC)[C@H](COC[C@H]9C[C@@H](OC)[C@@H](O[C@H]%10O[C@@H](CO)[C@H](O)[C@@H](O)[C@@H]%10O)[C@H](C)O9)[C@@H](C)O8
So, this is a confirmation just in case.
■ Python hand play (calculate descriptors in units of mordred) https://qiita.com/siinai/items/026aad1f05c9f6d51199
(py36) D:\py>python 71-01.py
GRAVH
-------------------------------------
inf
Yup. You certainly found inf.
I posted a slightly serious article for the first time in a long time. But after all it is interesting to move your hands. However, it just takes time. Even though I'm using a CPU of 4 Cores / 8 Threads, the CPU load factor is 30%. Oh, that's right. Well, it would be nice to seriously assemble multi-threads and divide them by computer or compound. I want to try it later.
For the time being, I will paste the calculation results.
Well ... I feel like I was able to do my best with 100% CPU load for more than a day just with this calculation ... I don't think so ... It's hard. You want a GPU, but you also want a CPU. That's 16 cores.
CHEMBL10786 CHEMBL263256 CHEMBL503873 CHEMBL501567 CHEMBL500702 CHEMBL501093 CHEMBL501094 CHEMBL505931 CHEMBL444732 CHEMBL444155 CHEMBL445174 CHEMBL445253 CHEMBL444510 CHEMBL501306 CHEMBL502034 CHEMBL499522 CHEMBL500203 CHEMBL498862 CHEMBL503717 CHEMBL503722 CHEMBL504025 CHEMBL504038 CHEMBL502642 CHEMBL500358 CHEMBL500619 CHEMBL500622 CHEMBL500058 CHEMBL500182 CHEMBL500184 CHEMBL504187 CHEMBL525749 CHEMBL525930 CHEMBL526006 CHEMBL526343 CHEMBL526355 CHEMBL526373 CHEMBL499978 CHEMBL499980 CHEMBL500099 CHEMBL500244 CHEMBL508221 CHEMBL500219 CHEMBL500223 CHEMBL506996 CHEMBL507128 CHEMBL525750 CHEMBL503778 CHEMBL503489 CHEMBL503495 CHEMBL507216 CHEMBL502664 CHEMBL502666 CHEMBL503666 CHEMBL503894 CHEMBL525940 CHEMBL525945 CHEMBL526501 CHEMBL500441 CHEMBL500451 CHEMBL502457 CHEMBL525219 CHEMBL525221 CHEMBL527042 CHEMBL525450 CHEMBL526129 CHEMBL526130 CHEMBL508387 CHEMBL508391 CHEMBL498956 CHEMBL503974 CHEMBL503979 CHEMBL507601 CHEMBL504097 CHEMBL524833 CHEMBL525962 CHEMBL525424 CHEMBL525951 CHEMBL526360 CHEMBL525216 CHEMBL525217 CHEMBL509192 CHEMBL501147 CHEMBL501266 CHEMBL503261 CHEMBL526689 CHEMBL526690 CHEMBL498967 CHEMBL501641 CHEMBL500002 CHEMBL500011 CHEMBL524521 CHEMBL506061 CHEMBL504078 CHEMBL508019 CHEMBL500187 CHEMBL500103 CHEMBL445002 CHEMBL525762 CHEMBL525763 CHEMBL525398 CHEMBL525399 CHEMBL526113 CHEMBL526115 CHEMBL526119 CHEMBL526121 CHEMBL526181 CHEMBL502415 CHEMBL502420 CHEMBL502978 CHEMBL505143 CHEMBL501291 CHEMBL502603 CHEMBL503695 CHEMBL504000 CHEMBL504159 CHEMBL526190 CHEMBL526301 CHEMBL501788 CHEMBL506306 CHEMBL500524 CHEMBL499537 CHEMBL501823 CHEMBL504080 CHEMBL504417 CHEMBL507534 CHEMBL502988 CHEMBL500373 CHEMBL500375 CHEMBL505276 CHEMBL500264 CHEMBL526336 CHEMBL525083 CHEMBL525086 CHEMBL525089 CHEMBL503245 CHEMBL503306 CHEMBL501970 CHEMBL503617 CHEMBL503852 CHEMBL503858 CHEMBL502077 CHEMBL501569 CHEMBL504902 CHEMBL526516 CHEMBL526681 CHEMBL526682 CHEMBL525441 CHEMBL501317 CHEMBL501323 CHEMBL502678 CHEMBL503342 CHEMBL507824 CHEMBL499931 CHEMBL499957 CHEMBL500483 CHEMBL500788 CHEMBL525771 CHEMBL503047 CHEMBL503286 CHEMBL504214 CHEMBL504401 CHEMBL525073 CHEMBL525624 CHEMBL526743 CHEMBL526874 CHEMBL526876 CHEMBL524358 CHEMBL524487 CHEMBL524488 CHEMBL527050 CHEMBL524494 CHEMBL524498 CHEMBL525068 CHEMBL525069 CHEMBL525407 CHEMBL525409 CHEMBL527084 CHEMBL591794 CHEMBL592148 CHEMBL592149 CHEMBL1208990 CHEMBL524531 CHEMBL524539 CHEMBL593680 CHEMBL589995 CHEMBL589997 CHEMBL525394 CHEMBL526678 CHEMBL526890 CHEMBL525224 CHEMBL525386 CHEMBL526131 CHEMBL596000 CHEMBL526544 CHEMBL526545 CHEMBL527072 CHEMBL527074 CHEMBL525419 CHEMBL525991 CHEMBL530121 CHEMBL526741 CHEMBL595999 CHEMBL526703 CHEMBL526853 CHEMBL526916 CHEMBL526922 CHEMBL525076 CHEMBL524356 CHEMBL524357 CHEMBL525237 CHEMBL525242 CHEMBL525402 CHEMBL530345 CHEMBL605624 CHEMBL608706 CHEMBL605628 CHEMBL595776 CHEMBL591446 CHEMBL607837 CHEMBL1097890 CHEMBL589278 CHEMBL589762 CHEMBL602303 CHEMBL605828 CHEMBL609471 CHEMBL604989 CHEMBL608415 CHEMBL1097888 CHEMBL1213233 CHEMBL611968 CHEMBL1099238 CHEMBL132931 CHEMBL135376 CHEMBL136703 CHEMBL194552 CHEMBL207341 CHEMBL214100 CHEMBL216830
Recommended Posts