I made an AI to judge whether it is alcohol or not!

Purpose

I would like to study materials informatics. This time, I will use RDKit, a tool that converts organic compounds into vectors, to create an AI that determines whether the organic compound given as data is alcohol.

Operating environment

Python: 3.6.5 scikit-learn: 0.20.3 rdkit: 2019.03.1.0

Source code

#Read the required library
from rdkit import Chem
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
import numpy as np

#Prepare the data.
#This time, alcohol to determine if it is alcohol(=1)And others(=0)To judge.
#Chemical formulas are expressed in SMILES notation.
smiles = ['CO', 'C(=O)O', 'CCO', 'C=O', 'CCCO', 'CCC', 'C(C)CO', 'C(=O)', 'CC(=O)', 'CC(=O)', 'C', 'CC(=O)C', 'CCCCO', 'C(C)CO']
ans = [1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1]

#Vector transform the chemical formula.
mols = [Chem.MolFromSmiles(smile) for smile in smiles]
finger_print = [AllChem.GetMorganFingerprintAsBitVect(mol, 2, 1024) for mol in mols]

#Divide the data into training data and test data
X = np.array(finger_print)
y = ans
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

#Modeling in a random forest of machine learning algorithms
forest = RandomForestClassifier()
forest.fit(X_train, y_train)

#Check the accuracy of the model
forest.score(X_train, y_train)
forest.score(X_test, y_test)

The flow is as follows.

    1. Data preparation
  1. Vectorize data using RDKit
    1. Divide data using SKLearn functions Four. Creating a model using SKLearn's algorithm class Five. Check the accuracy of the created model

The jupyter notebook file is uploaded to here on GitHub.

References

RDKit Official Page

Recommended Posts

I made an AI to judge whether it is alcohol or not!
I made an image classification model and tried to move it on mobile
Judge whether it is a prime number [Python]
I want to refute "Ruby is not cool here"
I made an action to automatically format python code
I made an Ansible-installer
Whether it's super or not, python is pretty beginner. I hate file operations too much.
I made an IoT device to naturally acquire positive thinking
If it is not easy to understand, it cannot be improved.
I made an animation to return Othello stones with POV-Ray
I made my own OSS because I wanted to contribute to it
[Natural language processing] I want to meet an engineer who is changing jobs (or just before)
I want to specify a file that is not a character string for logrotate, but is it impossible?
I made an Xubuntu server.
Tensorflow-GPU seems to be together if it is TF2.0 or later?
I wanted to do it like running an AtCoder test case.
I made an AI that crops an image nicely using Salience Map
When I try to use pip, SSL module is not available.
I made a POST script to create an issue on Github and register it in the Project