Programming languages such as Python and R have the advantage that they have abundant statistical analysis libraries and can use advanced statistical methods for free, but modifying the source code and operating from the command line are complicated. In some cases, Power BI visuals for statistical analysis are published in AppSource, but the features you want to use may not be implemented. You can also use Power BI's Python or R visuals. Since the plots are processed in a programming language, it takes a lot of time to create and adjust visuals that are easy to see. Here, using principal component analysis as an example, we will read data from a Power BI query, perform statistical analysis using Python in the query, and try visualization using the Power BI dashboard.
[Start multivariate analysis and principal component analysis with Pokemon! Use pokemon.csv from The Complete Pokemon Dataset published on Kaggle, referring to the article "Linking R and Tableau" (https://qiita.com/bashiiiwa/items/d783150ff4299dda27f1).
Load the downloaded Pockemon.csv.
Delete columns that are not needed for analysis, and make them Name column, data column 1, data column 2 ...
'dataset'Holds the input data for this script
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
dataset2=dataset.drop(dataset.columns[0],axis=1)
X=dataset2.values
pca = PCA()
pca.fit(X)
pca_point = pca.transform(X)
dataset['PC1']=pca_point[:,0]
dataset['PC2']=pca_point[:,1]
evr=pd.DataFrame(data=pca.explained_variance_ratio_, columns={'explained_variance_ratio'}, dtype='float')
evr['PC No.']=evr.index+1
components=pd.DataFrame(data=pca.components_, columns=dataset2.columns, dtype='float')
components['PC No.']=components.index+1
del dataset2
The contribution rate for each component is set in evr, and check the value. (It shows the influence of each component, the first principal component is 0.46, the second principal component is 0.19)
components contain eigenvectors. (Indicates how much each component is weighted to the data string and is a reference value for interpreting the main components)
Principal component 1 (PC1) and principal component 2 (PC2) of the principal component analysis result are added to the dataset, and these are read.
Create a tooltip page so that you can see the value of each field when you hover over it. Here, place multiple row cards. (Turn on tooltips, set the page size of the format on the page to tooltips)
Enable tooltips for scatter plots (turn on tooltips in formatting, type: report page, set to Hints that created the page). Hover your mouse cursor over a data point to see character characteristic data.
Let's compare it with Pokemon Data Analysis.py. On the eigenvector, the first principal component (PC1) contributes more in the order of Sp. Atk (special attack power) and Attack (attack power), so if you try plotting Sp. Atk in the size of a circle, Certainly, it tends to increase toward the right side of the first principal component (PC1). : slight_smile:
Recommended Posts