truffle on Biowulf

Quick Links

Documentation

Notes

Interactive job

Batch job

Swarm of jobs

Fast and accurate shared segment detection and relatedness estimation in un- phased genetic data using TRUFFLE

Documentation

truffle Main Site

Important Notes

Module Name: truffle (see the modules page for more information)
Multithreaded app
Example files in /usr/local/apps/truffle/TEST_DATA

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load truffle

[user@cn3144 ~]$ truffle --vcf /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE
     _               __  __ _       
    | |             / _|/ _| |      
    | |_ _ __ _   _| |_| |_| | ___  
    | __| '__| | | |  _|  _| |/ _ \ 
    | |_| |  | |_| | | | | | |  __/ 
     \__|_|   \__,_|_| |_| |_|\___| 
        
         - TRUFFLE v1.38 -    



***
*** Non-commerical and educational use license.
***


[*]  Options in effect: 
      -   Input file: /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz
      -   Number of CPUs: 2

      -   Reporting threshold: all pairs

      -   Segment reporting: YES

      -   Input file name: /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz
      -   Opening output file truffle.ibd  
      -   Opening output file truffle.segments  
      -   Number of samples: 47

      -   Allocation genotype vector  npeople=63 nvars=200000
      -   GenotypeMatrix: allocating 12 MB of memory
      -   Excluding variants with missing rate > 0.020 (1 samples)

      -   Excluding variants with allele frequency  < 0.060

      -   Reading chromosome 1 (pos=0)
      -   Reading chromosome 2 (pos=12291)
      -   Reading chromosome 3 (pos=24479)
      -   Reading chromosome 4 (pos=34767)
      -   Reading chromosome 5 (pos=44872)
      -   Reading chromosome 6 (pos=53948)
[...]
[*]  Genotype pre-processing duration:  506.32 ms 

 -  Compute IBD by IBS: (cpu=1/2) Nind = 47 Nvar = 155313
 -  Compute IBD by IBS: (cpu=2/2) Nind = 47 Nvar = 155313

[*] Finished processing 

      -   Total time for analysis was 0.02 minutes (1.1 seconds)

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. truffle.sh). For example:

#!/bin/bash
set -e
module load truffle
truffle --vcf /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 --mem=4g truffle.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. truffle.swarm). For example:

truffle --vcf s1.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s1
truffle --vcf s2.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s2
truffle --vcf s3.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s3

Submit this job using the swarm command.

swarm -f truffle.swarm -g 4 -t 4 --module truffle

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module truffle`	Loads the truffle module for each subjob in the swarm