Let's learn Deep SEA with Selene

Selene is a PyTorch-based Deep Learning library specializing in sequence data analysis. It's like NiftyNet in medical images, and you can do deep learning just by setting the cofig file without writing code. Since sampling can be performed under the same conditions, it is easy to compare model performance. Models made with Selene can also be shared by Kipoi in Previous article.

github:https://github.com/FunctionLab/selene Paper: https://www.nature.com/articles/s41592-019-0360-8 Document: https://selene.flatironinstitute.org/

By the way, please note that there is a Python library with the exact same name in the Web library.

Installation

Requires Python 3.6 or higher and PyTorch installed. The environment was Ubuntu 16.04, CUDA10, Anaconda, Python3.6, PyTorch 1.2.0.

pip install selene-sdk

You have now installed it. The version was 0.4.8.

Try to move

Basically, just write the settings in the config file and run. How to write the config file (YAML format) can be found at here. We will set four items: operation, model, sampler, and parameters. The data is reproducibly distributed to training data, validation data, and test data.

First, git clone.

git clone https://github.com/FunctionLab/selene.git

It took a long time to git clone. There is an example of the config file in the config_examples folder for your reference.

This time I will try quickstart_training in the tutorial.

cd selene/tutorials/quickstart_training
wget https://zenodo.org/record/1443558/files/selene_quickstart.tar.gz
tar -zxvf selene_quickstart.tar.gz
mv selene_quickstart_tutorial/* .

It takes time to download the data. If you cannot download selene_quickstart.tar.gz, you can replace it with this method.

config uses a deeper SEA model. I am changing to RandomPositionsSampler. "Simple_train.yml"

ops: [train, evaluate]
model: {
    path: ./deeperdeepsea.py,
    class: DeeperDeepSEA,
    class_args: {
        sequence_length: 1000,
        n_targets: 1,
    },
    non_strand_specific: mean
}
sampler: !obj:selene_sdk.samplers.RandomPositionsSampler {
    reference_sequence: !obj:selene_sdk.sequences.Genome {
        input_path: ./male.hg19.fasta
    },
    features: !obj:selene_sdk.utils.load_features_list {
        input_path: ./distinct_features.txt
    },
    target_path: ./sorted_GM12878_CTCF.bed.gz,
    seed: 100,
    sequence_length: 1000,
    center_bin_to_predict: 200,
    test_holdout: [chr8, chr9],
    validation_holdout: [chr6, chr7],
    feature_thresholds: 0.5,
    mode: train,
    save_datasets: [validate, test]
}
train_model: !obj:selene_sdk.TrainModel {
    batch_size: 64,
    max_steps: 8000,
    report_stats_every_n_steps: 1000,
    n_validation_samples: 32000,
    n_test_samples: 120000,
    cpu_n_threads: 10,
    use_cuda: True,
    data_parallel: False
}
random_seed: 1445
output_dir: ./training_outputs
create_subdirectory: False
load_test_set: False

To run, just run the following code in Python.

from selene_sdk.utils import load_path, parse_configs_and_run
parse_configs_and_run(load_path("./simple_train.yml"), lr=0.01)

Using about 2GB of GPU memory, I was able to finish learning in a few minutes. The result is below. (Accuracy evaluation with test data) roc_curves.jpg

precision_recall_curves.jpg

The model is saved in best_model.pth.tar.

Once the DeepSEA model is created, the effect of the mutation can be predicted by inputting the sequence data containing the mutation. You can simulate the result of GWAS by the prediction.

Impressions

It also supports Kipoi, and I felt it was a very useful library. This time I could only try the sample, so next time I would like to set the model myself.

Recommended Posts

Let's learn Deep SEA with Selene
Learn Python with ChemTHEATER
Learn Zundokokiyoshi with LSTM
Learn Pandas with Cheminformatics
Learn with chemoinformatics scikit-learn
Learn with Cheminformatics Matplotlib
Learn with an inverted pendulum DQN (Deep Q Network)
Learn with Cheminformatics NumPy
DCGAN with TF Learn
Learn Pendulum-v0 with DDPG
Let's play with 4D 4th
Let's play with Amedas data-Part 1
Learn librosa with a tutorial 1
Try deep learning with TensorFlow
Let's run Excel with Python
Learn elliptical orbits with Chainer
Let's make Othello with wxPython
Deep Kernel Learning with Pyro
Let's play with Amedas data-Part 4
Learn new data with PaintsChainer
Try Deep Learning with FPGA
Let's make dice with tkinter
Let's write python with cinema4d.
Let's do R-CNN with Sklearn-theano
Let's play with Amedas data-Part 3
Let's play with Amedas data-Part 2
Deep Embedded Clustering with Chainer 2.0
Let's build git-cat with Python
Generate Pokemon with Deep Learning