There are also some models that utilize deep learning in the field of genomics. For example, DeepBind, which predicts the binding ability of a base sequence to a DNA / RNA-binding protein, and DeepSEA, which predicts epigenetic modification from a base sequence.

Kipoi is a platform (referred to as a repository in the paper) that can utilize various models of deep learning of genomics. It supports models written in Keras, tensorflow, PyTorch, and Scikit-learn. You can load the trained model and easily try it out with your own array data to get the predictions.

Home page: http://kipoi.org/ Paper: https://www.nature.com/articles/s41587-019-0140-0

Actually use

The environment is Ubuntu 16.04 Anaconda Python 3.6 is.

Installation

pip install kipoi

You have now installed. The version was 0.6.25.

Error information

When I try to move it after this, I sometimes get the error ʻAttribute Error:'Table' object has no attribute'purge'`. This seems to be because tinydb is a new version. In that case, you can fix it by re-installing the old version as follows.

pip uninstall tinydb
pip install tinydb==3.13.0

Try moving Kipoi

The tutorial is published on github and you can try it out. (https://github.com/kipoi/examples) This time I will try the first part of the tutorial.

First, download the set of tutorials.

git clone https://github.com/kipoi/examples.git
cd examples

Load the kipoi model list.

kipoi ls

You can now use it. A folder (.kipoi / models /) containing models has been created in the home folder. Next, create a new virtual environment.

kipoi env create shared/envs/kipoi-py3-keras2

With this, a virtual environment called "kipoi-shared_envs_kipoi-py3-keras2" is created in conda. Go to that virtual environment.

conda activate kipoi-shared__envs__kipoi-py3-keras2

In the manual, the above command was source activate, but it didn't work, and it worked when I changed it to conda activate.

Let's try some test code.

kipoi test Basset --source=kipoi
kipoi test DeepSEA/predict --source=kipoi
kipoi test DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/ --source=kipoi

I was able to confirm that it works.

example1 Let's try the prediction of example1.

cd 1-predict

There is sample data in the input folder, so unzip it.

zcat input/hg19.chr22.fa.gz > input/hg19.chr22.fa

Make a prediction. In the DeepBind model used this time, 100 bases are input and the prediction of binding ability with CTCF is output. fasta_file contains base sequence information, intervals_file contains information on the location (start and end) of the base sequence to be retrieved.

kipoi predict DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
   --dataloader_args='{"intervals_file": "input/enhancer-regions.hg19.chr22.bed.gz",
                       "fasta_file": "input/hg19.chr22.fa"}' \
  -o preds.tsv

I was able to predict.

Let's take a look at the contents of the output (preds.tsv).

$ less preds.tsv
metadata/ranges/chr     metadata/ranges/end     metadata/ranges/id      metadata/ranges/start   metadata/ranges/strand  preds
chr22   17274293        0       17274192        *       -4.108346
chr22   17274548        1       17274447        *       -2.6903393
chr22   17275618        2       17275517        *       -5.2259955
chr22   17275863        3       17275762        *       -5.2259955
chr22   17287134        4       17287033        *       -4.2063684
chr22   17288718        5       17288617        *       -5.2259955
chr22   17570303        6       17570202        *       -4.93376
chr22   17597591        7       17597490        *       -4.4880404
chr22   17597800        8       17597699        *       -4.825454
chr22   17598104        9       17598003        *       -5.190316
・ ・ ・ ・ ・ ・ ・

The rightmost column is the predicted value.

Snakefile is used for the continuation of the tutorial, and it seems that you need to install snakemake to make it work. (snakemake is a useful workflow automation tool. See here)

How to run in Python

So far we've done everything with the CLI, but there are also APIs for Python and R. After entering the virtual environment, it seems to execute as follows in Python. (Excerpt from Manual)

import kipoi

kipoi.list_models() # list available models
model = kipoi.get_model("Basset") # load the model
model = kipoi.get_model(  # load the model from a past commit
    "https://github.com/kipoi/models/tree/<commit>/<model>",
    source='github-permalink'
)

# main attributes
model.model # wrapped model (say keras.models.Model)
model.default_dataloader # dataloader
model.info # description, authors, paper link, ...

# main methods
model.predict_on_batch(x) # implemented by all the models regardless of the framework
model.pipeline.predict(dict(fasta_file="hg19.fa", intervals_file="intervals.bed"))
# runs: raw files -[dataloader]-> numpy arrays -[model]-> predictions

Impressions etc.

I found it very interesting as a platform for genomics analysis. The virtual environment created with kipoi is treated in the same way as the virtual environment of Anaconda. Not only can you download and use the model, but you can also upload and share your own trained model. I want to continue using it.