I used genism's Poincarē embedding to embed a network of protein-protein interactions in Poincaré balls.
Hyperbolic space Hyperbolic space is one of the non-Euclidean spaces, which is a curved space with a negative curvature. It has the property that the space expands toward the periphery, and is said to be suitable for embedding networks with a hierarchical structure.
Poincare ball
Poincaré ball is one of the models of hyperbolic space. On the Poincare ball, the distance between the two points is
Poincarē embedding is implemented in python's natural language processing library genism, so use it. If you give a list of word pairs, it will embed a network of words in Poincareball.
# library import
import pandas as pd
import numpy as np
from gensim.models.poincare import PoincareModel
from genism.viz.poincare import poincare_2d_visualization
from plotly.offline import iplot
# Data set loading
# Download and unzip gene_attribute_edges.txt.gz.
# The pairs of proteins that bind to the source and target sections are listed.
dataset = pd.read_csv('gene_attribute_edges.txt', delimiter = '\t', usecols = [0,3], skiprows = [1])
# Create a list of bound protein pairs
relations = []
for i in range(len(dataset)):
relations.append((dataset.source[i], dataset.target[i]))
# Apply Poincare model to protein pairs
# We will visualize it in two dimensions later, so set size = 2.
model = PoincareModel(relations, size = 2)
model.train(epochs = 1000)
# Visualization
# If all the bonds between proteins are connected by lines, it will be too complicated to understand the figure, so the number of lines is limited to 5 with num_nodes.
map = poincare_2d_visualization(model = model, tree = relations, figure_title = 'PPI network', num_nodes = 5)
iplot(map)
Each point represents a protein, and the proteins that bind to it are connected by a line. As the value of epochs is gradually increased, learning progresses, and as the loss function decreases, proteins with less binding are placed in the periphery and proteins that serve as hubs are located in the center, resulting in a hierarchical structure.
I used gensim's Poincarē embedding to embed a network of interprotein bonds in Poincaré balls. It was found that as learning progresses, proteins with less binding are placed in the peripheral part, while proteins that serve as hubs remain in the central part and a hierarchical structure is acquired. Poincarē embedding seems to be a good way to embed a scale-free network.
Maximillian Nickel and Douwe Kiela, “Poincaré Embeddings for Learning Hierarchical Representations” genism Nuclear Receptor Signaling Atlas Malovannaya, A et al., "Analysis of the human endogenous coregulator complexome"
Recommended Posts