Plink is a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load plink [+] Loading plink 3.6-alpha [user@cn3144 ~]$ cp /usr/local/apps/plink/TEST_DATA/* . [user@cn3144 ~]$ plink2 --dummy 2 2 --freq --make-bed --out toy_data PLINK v2.00a3.6LM 64-bit Intel (14 Aug 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to toy_data.log. Options in effect: --dummy 2 2 --freq --make-bed --out toy_data Start time: Wed Sep 21 09:35:39 2022 233597 MiB RAM detected; reserving 116798 MiB for main workspace. Allocated 65698 MiB successfully, after larger attempt(s) failed. Using up to 14 threads (change this with --threads). Dummy data (2 samples, 2 SNPs) written to toy_data-temporary.pgen + toy_data-temporary.pvar + toy_data-temporary.psam . 2 samples (2 females, 0 males; 2 founders) loaded from toy_data-temporary.psam. 2 variants loaded from toy_data-temporary.pvar. 1 binary phenotype loaded (2 cases, 0 controls). Calculating allele frequencies... done. --freq: Allele frequencies (founders only) written to toy_data.afreq . Writing toy_data.fam ... done. Writing toy_data.bim ... done. Writing toy_data.bed ... done. End time: Wed Sep 21 09:35:39 2022 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. plink.sh). For example:
#!/bin/bash cd /data/$USER/plink/t1 plink2 --dummy 2 2 --freq --make-bed --out toy_data
Submit this job using the Slurm sbatch command.
sbatch --mem=6g plink.sh
Create a swarmfile (e.g. plink.swarm). For example:
cd /data/$USER/myseqs; plink2 --noweb --ped file1.ped --map file2.map --assoc cd /data/$USER/myseqs; plink2 --noweb --ped file2.ped --map file2.map --assoc cd /data/$USER/myseqs; plink2 --noweb --ped file3.ped --map file3.map --assoc [...etc...]
Submit this job using the swarm command.
swarm -f plink.swarm [-g #] --module plinkwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module plink | Loads the plink module for each subjob in the swarm |