Therefore, a useful index called ** SA score ** that can be used with RDKit [^ 1].
--Average: 3.5 --Median: 3.1
 It would be helpful when narrowing down candidate compounds using the SA score as a guide.
It would be helpful when narrowing down candidate compounds using the SA score as a guide.
--2020-04-07 Medical drugs currently manufactured and sold in Japan -KEGG DRUG with D number assigned drugs (2826 types, duplicates) --Desalting because I want to see the ease of synthesis of the main fragments --I borrowed @ yamasakih's desalt.py [^ 2] [^ 2]: Commentary; The story of creating a compound database that can be accessed from Jupyter Notebook for drug discovery raid battle 2018 with Docker-compose of razi --Qiita --Calculate SA score
This time, we will focus on compounds with 2 or more carbon atoms. In addition, mixed drugs, non-medicinal drugs, blood products, antibody drugs, crude drugs, etc. are excluded because they do not meet the purpose. The narrowing down at this point is 1641 compounds. Since the ones classified into multiple medicinal effects are duplicated, they are deleted to make 1436 compounds. The histogram will be the one described above.
| mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|
| 3.472627 | 1.262086 | 1.054917 | 2.547814 | 3.14387 | 4.190424 | 9.129873 | 
The overall distribution is found above. There is also information on drug efficacy classification, so let's see if there are any differences. Here, the calculation is based on the 1641 compound with no duplicates removed.
Furthermore, it is classified into central nervous system drugs, peripheral nervous system drugs, sensory organ drugs, and others. Memantine and baclofen belong to the drug.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 401 | 3.074675 | 1.066451 | 1.407299 | 2.368182 | 2.79175 | 3.496406 | 8.224301 | 

Furthermore, it is classified into circulatory organ medicine, respiratory organ medicine, digestive organ medicine, hormonal medicine, urinary and reproductive organ and anal medicine, dermal medicine, dental and oral medicine, and others. Examples of pharmaceuticals include olmesartan and esomeprazole.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 567 | 3.436648 | 1.211883 | 1.176561 | 2.556412 | 3.073396 | 4.338626 | 9.129873 | 

Furthermore, it is classified into vitamins, nourishing tonics, blood fluids, dialysis drugs, and others. Pharmaceuticals include prasugrel and canagliflozin.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 196 | 3.562633 | 1.263028 | 1.58004 | 2.774178 | 3.307104 | 4.199926 | 9.121023 | 

Furthermore, it is classified into cell-utilizing drugs, tumor drugs, radiopharmaceuticals, allergy drugs, and others. Drugs include irinotecan and cetirizine.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 191 | 3.608295 | 1.438333 | 1.694618 | 2.641493 | 3.066941 | 4.15542 | 7.705978 | 

Not applicable.
Furthermore, it is classified into antibiotic preparations, chemotherapeutic agents, biologics, parasite drugs, and others. Examples of pharmaceuticals include laninamivir and rifampicin.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 201 | 4.159825 | 1.384663 | 1.762741 | 3.202318 | 3.992249 | 4.690629 | 8.214511 | 

Furthermore, it is classified into dispensing drugs, diagnostic drugs, public health drugs, in-vitro diagnostic drugs, and others. Drugs include adenosine and edrophonium.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 70 | 3.336593 | 1.027391 | 1.054917 | 2.544872 | 3.402247 | 3.976367 | 5.783386 | 

It is further classified into alkaloid narcotics, non-alkaloid narcotics, and others.
| count | mean | std | min | 25% | 50% | 75% | max | 
|---|---|---|---|---|---|---|---|
| 15 | 3.747792 | 1.351973 | 1.977279 | 2.541722 | 3.994829 | 5.00452 | 5.273602 | 

Welch's t-test was used between each of the two groups.
| p-value | Nerve / sensory organs | Each organ | metabolism | Tissue cells | Pathogenic organisms | Non-treatment | drug | 
|---|---|---|---|---|---|---|---|
| Nerve / sensory organs | - | 0.000 | 0.000 | 0.000 | 0.000 | 0.053 | 0.076 | 
| Each organ | 0.000 | - | 0.225 | 0.140 | 0.000 | 0.453 | 0.392 | 
| metabolism | 0.000 | 0.225 | - | 0.740 | 0.000 | 0.140 | 0.615 | 
| Tissue cells | 0.000 | 0.140 | 0.740 | - | 0.000 | 0.093 | 0.707 | 
| Pathogenic organisms | 0.000 | 0.000 | 0.000 | 0.000 | - | 0.000 | 0.272 | 
| Non-treatment | 0.053 | 0.453 | 0.140 | 0.093 | 0.000 | - | 0.281 | 
| drug | 0.076 | 0.392 | 0.615 | 0.707 | 0.272 | 0.281 | - | 
It is considered that the average value of neurological and sensory drugs is easier to synthesize than others, and antibiotics, chemotherapeutic drugs, and antiallergic drugs are difficult to synthesize.
――It was a good practice for Pandas ――I would like to compare groups even with the target type [^ 3] --Please comment if you point out any mistakes
[^ 3]: KEGG BRITE: Target-based drug classification
Recommended Posts