--A script that plots the results of principal component analysis (PCA) using genetic statistical analysis software PLINK on a two-dimensional plane. Wrote. --Introduce script input / output files and execution method. --The script is here (link to GitHub)
Prepare a file containing the family ID in the first column, the personal ID in the second column, and the main component load in the third and subsequent columns. A file in such a format can be obtained by performing principal component analysis using PLINK.
#1 FamID
#2 Individual ID
#3 PC1
#4 PC2
...
Principal component analysis can be performed with the genetic statistical analysis software PLINK. Principal component analysis is a dimensionality reduction method based on the eigendecomposition of the variance-covariance matrix or correlation matrix. It is used for entanglement adjustment.
$ plink --bfile ${bfile_name} --out ${outfile_name} --pca
As a result of PCA output by PLINK, $ {outfile_name} .eigenvec
and $ {outfile_name} .eigenval
are obtained.
To illustrate the results, use $ {outfile_name} .eigenvec
(load of each principal component in each individual).
Prepare a file with the family ID in the first column, the individual ID in the second column, and the group label (race, etc.) in the third column. (Let's say populations.txt
.)
#1 FamID
#2 Individual ID
#3 Group
The execution environment is Python3, and pandas and matplotlib are installed.
Execute by specifying the following options.
--Specify a $ {outfile_name} .eigenvec
file for the -e
option
--Specify a populations.txt
file for the -p
option
--Specify the output directory in the -o
option
$ python plot_pca_gwas.py -e ${outfile_name}.eigenvec -p populations.txt -o ${output_directory}/
The following image is obtained as the output result of the script.
--pca.png
: Plot of the entire population
--pca_ {group} .png
: Plot for each group
Input files include example.eigenvec and [example_population.txt](https: / If you run the script using /github.com/t-yui/bioinformatics_scripts/blob/master/gwas_tools/plinkPCA/plot_examples/example_data/example_population.txt), you will get the following image.
2-1) pca_GROUP1.png
2-2) pca_GROUP2.png
2-3) pca_GROUP3.png
Recommended Posts