PRIMUS stands for Pedigree Reconstruction (PR) and Identification of a Maximum Unrelated Set (IMUS).
The IMUS method is an algorithm adapted from graph theory that always identifies the maximum set of unrelated individuals in any dataset, and allows weighting parameters to be utilized in unrelated sample selection. PRIMUS reads in user-generated IBD estimates and outputs the maximum possible set of unrelated individuals, given a specified threshold of relatedness. Additional information for preferential selection of individuals may also be utilized.
The PR algorithm is a method to reconstruct pedigrees within a genetic dataset. PRIMUS can verify expected pedigree structures from genetic data, and it can identify and incorporate novel, cryptic relationships into pedigrees.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load PRIMUS [+] Loading plink 2.3-alpha [+] Loading PRIMUS 1.9.0 [user@cn3144 ~]$ cp $PRIMUS_TEST_DATA/complete.genome . [user@cn3144 ~]$ run_PRIMUS.pl --plink complete.genome FILES AND COLUMNS LOG FILE: complete.genome_PRIMUS/PRIMUS_output.log IBD file: complete.genome (FID1=1; IID1=2; FID2=3; IID2=4; IBD0=7; IBD1=8; IBD2=9; PI_HAT/RELATEDNESS=10) Dataset results dir: complete.genome_PRIMUS Age file: none Sex file: none Affection file: none Trait weighting: size (size) SETTINGS Get PLINK IBD ESTIMATES with prePRIMUS: 0 Automatic reference population selection: 1 Verbose: 1 Relatedness threshold: 0.09375 Initial likelihood cutoff: 0.3 Max generations: none Max generational mating gap: 0 Get max unrelated set: 1 Reconstruct pedigrees: 1 Relatedness_file: complete.genome Threshold: 0.09375 Selection criteria are based on the following: size (size) IDENTFYING FAMILY NETWORKS IN DATA Writing network files to complete.genome_PRIMUS/ Loading data... done. done. IDENTIFYING A MAXIMUM UNRELATED SET Checking for large networks... done. # of family networks: 1 Writing out unrelated set done. Testing alternative methods... done. unrelated_file: complete.genome_maximum_independent_set unrelated_set size: 6 RECONSTRUCTING complete.genome_network1 Output directory: complete.genome_PRIMUS/complete.genome_network1 Use mito non-match: 0 Use mito match: 0 Use Y non-match: 0 Use Y match: 0 Entering Resolve PC trios. # of possible pedigrees: 1 Entering Phase 1. # of possible pedigrees: 1 Entering Phase 2. # of possible pedigrees: 1 Entering Phase 3. # of possible pedigrees: 1 networks pre-prune: 1 networks post-prune: 1 Writing summary file Writing .fam file for complete.genome_network1_1 Writing dataset Summary file complete.genome_PRIMUS/Summary_complete.genome.txt done. [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. PRIMUS.sh). For example:
#!/bin/bash set -e module load PRIMUS run_PRIMUS.pl --plink input.genome
Submit this job using the Slurm sbatch command.
sbatch [--mem=#] PRIMUS.sh
Create a swarmfile (e.g. PRIMUS.swarm). For example:
run_PRIMUS.pl --plink input1.genome run_PRIMUS.pl --plink input2.genome run_PRIMUS.pl --plink input3.genome run_PRIMUS.pl --plink input4.genome
Submit this job using the swarm command.
swarm -f PRIMUS.swarm [-g #] --module PRIMUSwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module PRIMUS | Loads the PRIMUS module for each subjob in the swarm |