I want to use PLINK data files (.bed, .fam, .map) etc. in Python. There is a module called pyplink.
Can be installed with pip
shell
pip install pyplink
Suppose you have a set of files in your current directory, such as foo.bed foo.fam foo.bim
python3
from pyplink import PyPlink
pyp = PyPlink("foo")
This will create an object called pyp. This is an integrated object of .bed, .fam and .bim files. You can access each information with various member functions.
python3
pyp.get_fam()
pyp.get_nb_samples()
pyp.get_bim()
pyp.get_nb_markers()
python3
markerNames = pyp.get_bim().iloc[:,5]
Get the genotype by specifying the marker name. Base information can be obtained by using acgt.
python3
pyp.get_geno_marker(markerNames[0])
pyp.get_acgt_geno_marker(markerNames[0])
You can also get the marker ID and genotype as an iterator.
python3
markers = ["rs7092431", "rs9943770", "rs1578483"]
for marker_id, genoypes in pyp.iter_geno_marker(markers):
print(marker_id)
print(genotypes, end="\n\n")
Get all male sample genotypes for markers on chromosome 23
python3
for marker_ID, genotypes in pyp.iter_geno_marker(y_markers):
male_genotypes = genotypes[males]
print("{:d} total genotypes".format(len(genotypes)))
print("{:d} genotypes for {:,d} males ({} on chr{} and position {:,d})".format(
len(male_genotypes),
males.sum(),
marker_ID,
all_markers.loc[marker_ID, "chrom"],
all_markers.loc[marker_ID, "pos"],
))
break
Get the Minor allele frequency and genotype of the specified marker
python3
founders = (all_samples.father == "0") & (all_samples.mother == "0")
markers = ["rs7092431", "rs9943770", "rs1587483"]
for marker_ID, genotypes in pyp.iter_geno_marker(markers):
valid_genotypes = genotypes[founders.values & (genotypes != -1)]
maf = valid_genotypes.sum()/(len(valid_genotypes)*2)
print(marker_ID, round(maf, 6), sep="\t")
print(genotypes)
Recommended Posts