High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Plink on Biowulf

Plink is a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.

The utility FCgene, a format converting tool for genotyped data (e.g. PLINK-MACH, MACH-PLINK) is also available. Type 'module load fcgene' to add the binary to your path, and then 'fcgene' to run it.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load plink

[user@cn3144 ~]$ cp /usr/local/apps/plink/TEST_DATA/* .

[user@cn3144 ~]$ plink --file toy
PLINK v1.90b4.4 64-bit (21 May 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file toy

128733 MB RAM detected; reserving 64366 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (2 variants, 2 people).
--file: plink.bed + plink.bim + plink.fam written.

[user@cn3144 ~]$ plink --file toy --freq
[...]
2 variants loaded from .bim file.
2 people (2 males, 0 females) loaded from .fam.
2 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 2 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.75.
--freq: Allele frequencies (founders only) written to plink.frq .

[user@cn3144 ~]$ plink --file toy --assoc

[user@cn3144 ~]$ plink --file toy --make-bed /home/$USER/plink/t1

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. plink.sh). For example:

#!/bin/bash
cd /data/$USER/plink/t1
plink --noweb --file test1
plink --noweb --file test1 --freq
plink --noweb --file test1 --assoc
plink --noweb --file test1 --make-bed

Submit this job using the Slurm sbatch command.

sbatch --mem=6g plink.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. plink.swarm). For example:

cd /data/$USER/myseqs; plink --noweb --ped file1.ped --map file1.map --assoc
cd /data/$USER/myseqs; plink --noweb --ped file2.ped --map file2.map --assoc
cd /data/$USER/myseqs; plink --noweb --ped file3.ped --map file3.map --assoc
[...etc...]

Submit this job using the swarm command.

swarm -f plink.swarm [-g #] --module plink
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module plink Loads the plink module for each subjob in the swarm