Fast and accurate shared segment detection and relatedness estimation in un- phased genetic data using TRUFFLE
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load truffle [user@cn3144 ~]$ truffle --vcf /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE _ __ __ _ | | / _|/ _| | | |_ _ __ _ _| |_| |_| | ___ | __| '__| | | | _| _| |/ _ \ | |_| | | |_| | | | | | | __/ \__|_| \__,_|_| |_| |_|\___| - TRUFFLE v1.38 - *** *** Non-commerical and educational use license. *** [*] Options in effect: - Input file: /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz - Number of CPUs: 2 - Reporting threshold: all pairs - Segment reporting: YES - Input file name: /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz - Opening output file truffle.ibd - Opening output file truffle.segments - Number of samples: 47 - Allocation genotype vector npeople=63 nvars=200000 - GenotypeMatrix: allocating 12 MB of memory - Excluding variants with missing rate > 0.020 (1 samples) - Excluding variants with allele frequency < 0.060 - Reading chromosome 1 (pos=0) - Reading chromosome 2 (pos=12291) - Reading chromosome 3 (pos=24479) - Reading chromosome 4 (pos=34767) - Reading chromosome 5 (pos=44872) - Reading chromosome 6 (pos=53948) [...] [*] Genotype pre-processing duration: 506.32 ms - Compute IBD by IBS: (cpu=1/2) Nind = 47 Nvar = 155313 - Compute IBD by IBS: (cpu=2/2) Nind = 47 Nvar = 155313 [*] Finished processing - Total time for analysis was 0.02 minutes (1.1 seconds) [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. truffle.sh). For example:
#!/bin/bash set -e module load truffle truffle --vcf /usr/local/apps/truffle/TEST_DATA/fs-and-po-pairs-from-1000genomes.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=4 --mem=4g truffle.sh
Create a swarmfile (e.g. truffle.swarm). For example:
truffle --vcf s1.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s1 truffle --vcf s2.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s2 truffle --vcf s3.vcf.gz --segments --cpu $SLURM_CPUS_ON_NODE --out s3
Submit this job using the swarm command.
swarm -f truffle.swarm -g 4 -t 4 --module trufflewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module truffle | Loads the truffle module for each subjob in the swarm |