Plink is a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.
The utility FCgene, a format converting tool for genotyped data (e.g. PLINK-MACH, MACH-PLINK) is also available. Type 'module load fcgene' to add the binary to your path, and then 'fcgene' to run it.
References:
- Purcell et al., PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. 2007. Link
- Module Name: plink (see the modules page for more information)
- Singlethreaded app
- Example files in /usr/local/apps/plink/TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load plink [user@cn3144 ~]$ cp /usr/local/apps/plink/TEST_DATA/* . [user@cn3144 ~]$ plink --file toy PLINK v1.90b4.4 64-bit (21 May 2017) www.cog-genomics.org/plink/1.9/ (C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --file toy 128733 MB RAM detected; reserving 64366 MB for main workspace. .ped scan complete (for binary autoconversion). Performing single-pass .bed write (2 variants, 2 people). --file: plink.bed + plink.bim + plink.fam written. [user@cn3144 ~]$ plink --file toy --freq [...] 2 variants loaded from .bim file. 2 people (2 males, 0 females) loaded from .fam. 2 phenotype values loaded from .fam. Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 2 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is 0.75. --freq: Allele frequencies (founders only) written to plink.frq . [user@cn3144 ~]$ plink --file toy --assoc [user@cn3144 ~]$ plink --file toy --make-bed /home/$USER/plink/t1 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. plink.sh). For example:
#!/bin/bash cd /data/$USER/plink/t1 plink --noweb --file test1 plink --noweb --file test1 --freq plink --noweb --file test1 --assoc plink --noweb --file test1 --make-bed
Submit this job using the Slurm sbatch command.
sbatch --mem=6g plink.sh
Create a swarmfile (e.g. plink.swarm). For example:
cd /data/$USER/myseqs; plink --noweb --ped file1.ped --map file1.map --assoc cd /data/$USER/myseqs; plink --noweb --ped file2.ped --map file2.map --assoc cd /data/$USER/myseqs; plink --noweb --ped file3.ped --map file3.map --assoc [...etc...]
Submit this job using the swarm command.
swarm -f plink.swarm [-g #] --module plinkwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module plink | Loads the plink module for each subjob in the swarm |