I will introduce the procedure to draw an independent graph with graphviz
There are two reasons why the correlation is observed:
--If there is a causal relationship --When there is a common factor that has a causal relationship
The partial correlation is to obtain the correlation coefficient after removing the latter effect, and the independent graph shows the factors with high partial correlation connected to each other. See below for details.
Derivation of the meaning and formula of the partial correlation coefficient https://mathtrain.jp/partialcor
I haven't confirmed it yet, but I think it will probably be below
terminal
pip install graphviz
terminal
conda install -c conda-forge python-graphviz
You can define a node with node () and define a concatenation with edge () as shown below. When render () is executed, the graphviz source code is exported once, and the graph is exported as png or pdf based on it. If cleanup = True, after exporting the image file, export it as png below
python
from graphviz import Graph
g = Graph(format='png')
g.node('1')
g.node('2')
g.node('3')
g.edge('1', '2')
g.edge('2', '3')
g.edge('3', '1')
g.render(filename='../test', format='png', cleanup=True, directory=None)
display(Image.open('../test.png'))
python
from graphviz import Digraph
dg = Digraph(format='png')
dg.node('1')
dg.node('2')
dg.node('3')
dg.edge('1', '2') # 1 -> 2
dg.edge('2', '3') # 2 -> 3
dg.edge('3', '1') # 3 -> 1
dg.render(filename='../test', format='png', cleanup=True, directory=None)
display(Image.open('../test.png'))
This time I will use iris as sample data
python
import numpy as np
import pandas as pd
from sklearn import datasets
import seaborn as sns
iris = datasets.load_iris()
df = pd.DataFrame(np.hstack([iris.data, iris.target.reshape(-1, 1)]),
columns=iris.feature_names + ['label'])
sns.pairplot(df, hue='label')
python
import matplotlib.pyplot as plt
cm = pd.DataFrame(np.corrcoef(df.T), columns=df.columns, index=df.columns)
sns.heatmap(cm, annot=True, square=True, vmin=-1, vmax=1, fmt=".2f", cmap="RdBu")
plt.savefig("pcor.png ")
plt.show()
I borrowed this code. Hatena Blog Hashikure Engineer Mocking notes
There seems to be a way to test it a little more carefully and not subtract the correlation that is not significant, but here it is a uniform subtraction.
python
import scipy
def cor2pcor(R):
inv_cor = scipy.linalg.inv(R)
rows = inv_cor.shape[0]
regu_1 = 1 / np.sqrt(np.diag(inv_cor))
regu_2 = np.repeat(regu_1, rows).reshape(rows, rows)
pcor = (-inv_cor) * regu_1 * regu_2
np.fill_diagonal(pcor, 1)
return pcor
pcor = pd.DataFrame(cor2pcor(cm), columns=cm.columns, index=cm.index)
sns.heatmap(pcor, annot=True, square=True, vmin=-1, vmax=1, fmt=".2f", cmap="RdBu")
plt.savefig("pcor.png ")
plt.show()
Draw an undirected graph by concatenating places where the absolute value of the correlation coefficient is larger than the appropriately set threshold.
python
from graphviz import Graph
from PIL import Image
def draw_graph(cm, threshold):
edges = np.where(np.abs(cm) > threshold)
edges = [[cm.index[i], cm.index[j]] for i, j in zip(edges[0], edges[1]) if i > j]
g = Graph(format='png')
for k in range(cm.shape[0]):
g.node(cm.index[k])
for i, j in edges:
g.edge(j, i)
g.render(filename='../test', format='png', cleanup=True, directory=None)
display(Image.open('../test.png'))
threshold = 0.3
draw_graph(cm, threshold)
draw_graph(pcor, threshold)
Since the correlation coefficient is low, it seems a little difficult to conclude with this alone, but if this is correct, the length and width of the calyx only correlate with the length and width of the petals, not directly with the type of iris. It seems like a thing. It is better to make a graph rather than looking at the correlation matrix so that the image is easier to understand.
Let's try
Recommended Posts