PRIMUS on Biowulf
PRIMUS stands for Pedigree Reconstruction (PR) and Identification of a Maximum Unrelated Set (IMUS).
The IMUS method is an algorithm adapted from graph theory that always identifies the maximum set of unrelated individuals in any dataset, and allows weighting parameters to be utilized in unrelated sample selection. PRIMUS reads in user-generated IBD estimates and outputs the maximum possible set of unrelated individuals, given a specified threshold of relatedness. Additional information for preferential selection of individuals may also be utilized.
The PR algorithm is a method to reconstruct pedigrees within a genetic dataset. PRIMUS can verify expected pedigree structures from genetic data, and it can identify and incorporate novel, cryptic relationships into pedigrees.
References:
- Jeffrey Staples, Dandi Qiao, Michael H. Cho, Edwin K. Silverman, University of Washington Center for Mendelian Genomics, Deborah A. Nickerson, and Jennifer E. Below. PRIMUS: Rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. The American Journal of Human Genetics, Volume 95, Issue 5, 2014, Pages 553-564
Documentation
Important Notes
- Module Name: PRIMUS (see the modules page for more information)
- Singlethreaded
- For example data, see the environment variable, PRIMUS_TEST_DATA
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load PRIMUS [+] Loading plink 2.3-alpha [+] Loading PRIMUS 1.9.0 [user@cn3144 ~]$ cp $PRIMUS_TEST_DATA/complete.genome . [user@cn3144 ~]$ run_PRIMUS.pl --plink complete.genome FILES AND COLUMNS LOG FILE: complete.genome_PRIMUS/PRIMUS_output.log IBD file: complete.genome (FID1=1; IID1=2; FID2=3; IID2=4; IBD0=7; IBD1=8; IBD2=9; PI_HAT/RELATEDNESS=10) Dataset results dir: complete.genome_PRIMUS Age file: none Sex file: none Affection file: none Trait weighting: size (size) SETTINGS Get PLINK IBD ESTIMATES with prePRIMUS: 0 Automatic reference population selection: 1 Verbose: 1 Relatedness threshold: 0.09375 Initial likelihood cutoff: 0.3 Max generations: none Max generational mating gap: 0 Get max unrelated set: 1 Reconstruct pedigrees: 1 Relatedness_file: complete.genome Threshold: 0.09375 Selection criteria are based on the following: size (size) IDENTFYING FAMILY NETWORKS IN DATA Writing network files to complete.genome_PRIMUS/ Loading data... done. done. IDENTIFYING A MAXIMUM UNRELATED SET Checking for large networks... done. # of family networks: 1 Writing out unrelated set done. Testing alternative methods... done. unrelated_file: complete.genome_maximum_independent_set unrelated_set size: 6 RECONSTRUCTING complete.genome_network1 Output directory: complete.genome_PRIMUS/complete.genome_network1 Use mito non-match: 0 Use mito match: 0 Use Y non-match: 0 Use Y match: 0 Entering Resolve PC trios. # of possible pedigrees: 1 Entering Phase 1. # of possible pedigrees: 1 Entering Phase 2. # of possible pedigrees: 1 Entering Phase 3. # of possible pedigrees: 1 networks pre-prune: 1 networks post-prune: 1 Writing summary file Writing .fam file for complete.genome_network1_1 Writing dataset Summary file complete.genome_PRIMUS/Summary_complete.genome.txt done. [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. PRIMUS.sh). For example:
#!/bin/bash set -e module load PRIMUS run_PRIMUS.pl --plink input.genome
Submit this job using the Slurm sbatch command.
sbatch [--mem=#] PRIMUS.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. PRIMUS.swarm). For example:
run_PRIMUS.pl --plink input1.genome run_PRIMUS.pl --plink input2.genome run_PRIMUS.pl --plink input3.genome run_PRIMUS.pl --plink input4.genome
Submit this job using the swarm command.
swarm -f PRIMUS.swarm [-g #] --module PRIMUSwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module PRIMUS | Loads the PRIMUS module for each subjob in the swarm |