Introduction

Programming languages such as Python and R have the advantage that they have abundant statistical analysis libraries and can use advanced statistical methods for free, but modifying the source code and operating from the command line are complicated. In some cases, Power BI visuals for statistical analysis are published in AppSource, but the features you want to use may not be implemented. You can also use Power BI's Python or R visuals. Since the plots are processed in a programming language, it takes a lot of time to create and adjust visuals that are easy to see. Here, using principal component analysis as an example, we will read data from a Power BI query, perform statistical analysis using Python in the query, and try visualization using the Power BI dashboard.

Sample data

[Start multivariate analysis and principal component analysis with Pokemon! Use pokemon.csv from The Complete Pokemon Dataset published on Kaggle, referring to the article "Linking R and Tableau" (https://qiita.com/bashiiiwa/items/d783150ff4299dda27f1).

Edit Power BI query

Load the downloaded Pockemon.csv.
Delete columns that are not needed for analysis, and make them Name column, data column 1, data column 2 ...

Add a Python script. In this script, the first column is Name and the second and subsequent columns are data, and principal component analysis is performed using the skikit-learn library. The Python code for principal component analysis is based on Meaningful Principal Component Analysis (https://qiita.com/NoriakiOshita/items/460247bb57c22973a5f0).

 'dataset'Holds the input data for this script

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
dataset2=dataset.drop(dataset.columns[0],axis=1)
X=dataset2.values
pca = PCA()
pca.fit(X)
pca_point = pca.transform(X)
dataset['PC1']=pca_point[:,0]
dataset['PC2']=pca_point[:,1]
evr=pd.DataFrame(data=pca.explained_variance_ratio_,  columns={'explained_variance_ratio'}, dtype='float')
evr['PC No.']=evr.index+1
components=pd.DataFrame(data=pca.components_,  columns=dataset2.columns, dtype='float')
components['PC No.']=components.index+1
del dataset2

The contribution rate for each component is set in evr, and check the value. (It shows the influence of each component, the first principal component is 0.46, the second principal component is 0.19)
components contain eigenvectors. (Indicates how much each component is weighted to the data string and is a reference value for interpreting the main components)
Principal component 1 (PC1) and principal component 2 (PC2) of the principal component analysis result are added to the dataset, and these are read.

Creating a dashboard

Plot the X-axis PC1 and Y-axis PC2 on the scatter plot. (Display data label in category On)

Create a tooltip page so that you can see the value of each field when you hover over it. Here, place multiple row cards. (Turn on tooltips, set the page size of the format on the page to tooltips)
Enable tooltips for scatter plots (turn on tooltips in formatting, type: report page, set to Hints that created the page). Hover your mouse cursor over a data point to see character characteristic data.

result of analysis

Let's compare it with Pokemon Data Analysis.py. On the eigenvector, the first principal component (PC1) contributes more in the order of Sp. Atk (special attack power) and Attack (attack power), so if you try plotting Sp. Atk in the size of a circle, Certainly, it tends to increase toward the right side of the first principal component (PC1). : slight_smile:

Principal component analysis with Power BI + Python

Introduction

Sample data

Edit Power BI query

Creating a dashboard

result of analysis