adVNTR on Biowulf

adVNTR is a tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data. It works with both NGS short reads (Illumina HiSeq) and SMRT reads (PacBio) and finds diploid repeating counts for VNTRs and identifies possible mutations in the VNTR sequences.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=5g --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load advntr
[user@cn3144]$ cp ${ADVNTR_TEST_DATA} .
[user@cn3144]$ cd TEST_DATA
[user@cn3144]$ mkdir log_dir
[user@cn3144]$ advntr genotype --vntr_id 301645 --alignment_file CSTB_2_5_testdata.bam --working_directory log_dir/
    2021-04-30 18:46:02.676445: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    301645
    2/2

Batch job
Most jobs should be run as batch jobs.

Create a batch script file (e.g. advntr.sh). For example:

#!/bin/bash
cd /lscratch/$SLURM_JOB_ID
module load advntr
cp $ADVNTR_TEST_DATA .
cd TEST_DATA
advntr genotype --vntr_id 301645 --alignment_file CSTB_2_5_testdata.bam --working_directory log_dir/
....
....

Submit this job using the Slurm sbatch command.

sbatch --mem=10g --gres=lscratch:20 advntr.sh