Introduction

I used genism's Poincarē embedding to embed a network of protein-protein interactions in Poincaré balls.

Hyperbolic space Hyperbolic space is one of the non-Euclidean spaces, which is a curved space with a negative curvature. It has the property that the space expands toward the periphery, and is said to be suitable for embedding networks with a hierarchical structure.
Poincare ball Poincaré ball is one of the models of hyperbolic space. On the Poincare ball, the distance between the two points is$d(u,v) = \mathrm{arccosh}\left(1+2\frac{||u-v||^{2}}{(1-||u||^{2})(1-||v||^{2})}\right).$ It is expressed as. We define the loss function based on the distance on the Poincare ball as follows: $\Theta'\leftarrow \mathrm{arg min}\mathcal{L}(\Theta)\;\;\;\mathrm{s.t.}\forall\theta_{i}\in\Theta:||\theta_{i}||<1.$ Poincarē embedding coordinates so that this loss function is minimized.

Network of interprotein bonds

Prote-protein interaction network
Various proteins function in the body, but many of them function by binding to each other rather than alone. A protein-protein interaction network is a network constructed by connecting proteins that bind to each other.
Features of protein-protein interaction network Proteins bind to each other, but the number of binding partners is not uniform. Most proteins bind only to specific proteins. Very few proteins bind to many proteins. As a result, the protein-protein binding network has a structure in which a small number of proteins serve as a hub for the network and connect the majority of terminal proteins. The elements in the network are called nodes, the branches connecting the nodes are called edges, and the number of edges for each node is called degree. In the network of interprotein bonds, the degree is unevenly distributed. Networks whose degree distribution follows the power rule are called scale-free networks or small world networks. Scale-free networks are often observed in naturally occurring networks. The network of interprotein bonds is also known to be scale-free.
Data set Get information about protein-protein binding from the database. I used NURSA protein-protein interactions dataset. Download the Gene attribute Edge List in this. This file lists the pair of proteins that bind. From this information, we build a network of binding proteins and embed it in Poincareball.

gensim Poincarē embedding

Poincarē embedding is implemented in python's natural language processing library genism, so use it. If you give a list of word pairs, it will embed a network of words in Poincareball.

# library import
import pandas as pd
import numpy as np
from gensim.models.poincare import PoincareModel
from genism.viz.poincare import poincare_2d_visualization
from plotly.offline import iplot
# Data set loading
# Download and unzip gene_attribute_edges.txt.gz.
# The pairs of proteins that bind to the source and target sections are listed.
dataset = pd.read_csv('gene_attribute_edges.txt', delimiter = '\t', usecols = [0,3], skiprows = [1])
# Create a list of bound protein pairs
relations = []
for i in range(len(dataset)):
    relations.append((dataset.source[i], dataset.target[i]))
# Apply Poincare model to protein pairs
# We will visualize it in two dimensions later, so set size = 2.
model = PoincareModel(relations, size = 2)
model.train(epochs = 1000)
# Visualization
# If all the bonds between proteins are connected by lines, it will be too complicated to understand the figure, so the number of lines is limited to 5 with num_nodes.
map = poincare_2d_visualization(model = model, tree = relations, figure_title = 'PPI network', num_nodes = 5)
iplot(map)

Each point represents a protein, and the proteins that bind to it are connected by a line. As the value of epochs is gradually increased, learning progresses, and as the loss function decreases, proteins with less binding are placed in the periphery and proteins that serve as hubs are located in the center, resulting in a hierarchical structure.

Conclusion

I used gensim's Poincarē embedding to embed a network of interprotein bonds in Poincaré balls. It was found that as learning progresses, proteins with less binding are placed in the peripheral part, while proteins that serve as hubs remain in the central part and a hierarchical structure is acquired. Poincarē embedding seems to be a good way to embed a scale-free network.

reference

Maximillian Nickel and Douwe Kiela, “Poincaré Embeddings for Learning Hierarchical Representations” genism Nuclear Receptor Signaling Atlas Malovannaya, A et al., "Analysis of the human endogenous coregulator complexome"

I tried to embed a protein-protein interaction network in hyperbolic space with Poincarē embeding of gensim

Introduction

Network of interprotein bonds

gensim Poincarē embedding

Conclusion

reference