Biowulf High Performance Computing at the NIH
Admixture on Biowulf

ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load admixture

[user@cn3144 ~]$ cp /usr/local/apps/admixture/example/* .

[user@cn3144 ~]$ admixture -help

****                   ADMIXTURE Version 1.3.0                  ****
****                    Copyright 2008-2015                     ****
****           David Alexander, Suyash Shringarpure,            ****
****                John  Novembre, Ken Lange                   ****
****                                                            ****
****                 Please cite our paper!                     ****
****   Information at www.genetics.ucla.edu/software/admixture  ****

                                                                              
  ADMIXTURE basic usage:  (see manual for complete reference)                 
    % admixture [options] inputFile K                                         
                                                                              
  where:                                                                      
    K is the number of populations; and                                       
    inputFile may be:                                                         
      - a PLINK .bed file                                                     
      - a PLINK "12" coded .ped file                                        
                                                                              
  Output will be in files inputBasename.K.Q, inputBasename.K.P                
                                                                              
  General options:                                                            
    -jX          : do computation on X threads                                
    --seed=X     : use random seed X for initialization                       
                                                                              
  Algorithm options:                                                          
     -m=                                                                      
    --method=[em|block]     : set method.  block is default                   
                                                                              
     -a=                                                                      
    --acceleration=none   |                                                   
                   sqs |                                                   
                   qn      : set acceleration                              
                                                                              
  Convergence criteria:                                                       
    -C=X : set major convergence criterion (for point estimation)             
    -c=x : set minor convergence criterion (for bootstrap and CV reestimates) 
                                                                              
  Bootstrap standard errors:                                                  
    -B[X]      : do bootstrapping [with X replicates]                         

[user@cn3144 ~]$ admixture hapmap3.bed 3

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. admixture.sh). For example:

#!/bin/bash
set -e
module load admixture
admixture hapmap3.bed 3 -j${SLURM_CPUS_PER_TASK} > admixture.out

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=16 --mem=20g admixture.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. admixture.swarm). For example:

admixture hapmap3.bed 3 -j${SLURM_CPUS_PER_TASK} > admixture3.out
admixture hapmap20.bed 20 -j${SLURM_CPUS_PER_TASK} > admixture20.out
admixture hapmap1000.bed 1000 -j${SLURM_CPUS_PER_TASK} > admixture1000.out

Submit this job using the swarm command.

swarm -f admixture.swarm -g 20 -t 16 --module admixture
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module admixture Loads the admixture module for each subjob in the swarm