There are also some models that utilize deep learning in the field of genomics. For example, DeepBind, which predicts the binding ability of a base sequence to a DNA / RNA-binding protein, and DeepSEA, which predicts epigenetic modification from a base sequence.
Kipoi is a platform (referred to as a repository in the paper) that can utilize various models of deep learning of genomics. It supports models written in Keras, tensorflow, PyTorch, and Scikit-learn. You can load the trained model and easily try it out with your own array data to get the predictions.
Home page: http://kipoi.org/ Paper: https://www.nature.com/articles/s41587-019-0140-0
The environment is Ubuntu 16.04 Anaconda Python 3.6 is.
pip install kipoi
You have now installed. The version was 0.6.25.
When I try to move it after this, I sometimes get the error ʻAttribute Error:'Table' object has no attribute'purge'`. This seems to be because tinydb is a new version. In that case, you can fix it by re-installing the old version as follows.
pip uninstall tinydb
pip install tinydb==3.13.0
The tutorial is published on github and you can try it out. (https://github.com/kipoi/examples) This time I will try the first part of the tutorial.
First, download the set of tutorials.
git clone https://github.com/kipoi/examples.git
cd examples
Load the kipoi model list.
kipoi ls
You can now use it. A folder (.kipoi / models /) containing models has been created in the home folder. Next, create a new virtual environment.
kipoi env create shared/envs/kipoi-py3-keras2
With this, a virtual environment called "kipoi-shared_envs_kipoi-py3-keras2" is created in conda. Go to that virtual environment.
conda activate kipoi-shared__envs__kipoi-py3-keras2
source activate
, but it didn't work, and it worked when I changed it to conda activate
.Let's try some test code.
kipoi test Basset --source=kipoi
kipoi test DeepSEA/predict --source=kipoi
kipoi test DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF/ --source=kipoi
I was able to confirm that it works.
example1 Let's try the prediction of example1.
cd 1-predict
There is sample data in the input folder, so unzip it.
zcat input/hg19.chr22.fa.gz > input/hg19.chr22.fa
Make a prediction. In the DeepBind model used this time, 100 bases are input and the prediction of binding ability with CTCF is output. fasta_file contains base sequence information, intervals_file contains information on the location (start and end) of the base sequence to be retrieved.
kipoi predict DeepBind/Homo_sapiens/TF/D00328.018_ChIP-seq_CTCF \
--dataloader_args='{"intervals_file": "input/enhancer-regions.hg19.chr22.bed.gz",
"fasta_file": "input/hg19.chr22.fa"}' \
-o preds.tsv
I was able to predict.
Let's take a look at the contents of the output (preds.tsv).
$ less preds.tsv
metadata/ranges/chr metadata/ranges/end metadata/ranges/id metadata/ranges/start metadata/ranges/strand preds
chr22 17274293 0 17274192 * -4.108346
chr22 17274548 1 17274447 * -2.6903393
chr22 17275618 2 17275517 * -5.2259955
chr22 17275863 3 17275762 * -5.2259955
chr22 17287134 4 17287033 * -4.2063684
chr22 17288718 5 17288617 * -5.2259955
chr22 17570303 6 17570202 * -4.93376
chr22 17597591 7 17597490 * -4.4880404
chr22 17597800 8 17597699 * -4.825454
chr22 17598104 9 17598003 * -5.190316
・ ・ ・ ・ ・ ・ ・
The rightmost column is the predicted value.
Snakefile is used for the continuation of the tutorial, and it seems that you need to install snakemake to make it work. (snakemake is a useful workflow automation tool. See here)
So far we've done everything with the CLI, but there are also APIs for Python and R. After entering the virtual environment, it seems to execute as follows in Python. (Excerpt from Manual)
import kipoi
kipoi.list_models() # list available models
model = kipoi.get_model("Basset") # load the model
model = kipoi.get_model( # load the model from a past commit
"https://github.com/kipoi/models/tree/<commit>/<model>",
source='github-permalink'
)
# main attributes
model.model # wrapped model (say keras.models.Model)
model.default_dataloader # dataloader
model.info # description, authors, paper link, ...
# main methods
model.predict_on_batch(x) # implemented by all the models regardless of the framework
model.pipeline.predict(dict(fasta_file="hg19.fa", intervals_file="intervals.bed"))
# runs: raw files -[dataloader]-> numpy arrays -[model]-> predictions
I found it very interesting as a platform for genomics analysis. The virtual environment created with kipoi is treated in the same way as the virtual environment of Anaconda. Not only can you download and use the model, but you can also upload and share your own trained model. I want to continue using it.
Recommended Posts