High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
HTGTS on NIH HPC Systems

HTGTS represents High-Throughput Genome-Wide Translocation Sequencing pipeline - provided by the Alt Lab.

$ module show htgts
----------------------------------------------------------------------------------------------------------------------------------------------
/usr/local/lmod/modulefiles/htgts/v2:
----------------------------------------------------------------------------------------------------------------------------------------------
whatis("Sets up HTGTS v2 ")
prepend_path("PATH","/usr/local/apps/htgts/v2/bin")
prepend_path("PATH","/usr/local/apps/htgts/v2/R")

load("bowtie")
load("samtools/0.1.17-fpic")
load("ea-utils")
load("seqprep/0.2.8.1")
load("R")

The pipeline requires both a fasta file and a bowtie2 index of your reference genome. The pipeline will search for these elements in the locations specified by two environment variables: $GENOME_DB and $BOWTIE2_INDEXES. For example below, these two variables are set as:

$ export GENOME_DB=/data/$USER/htgts/genomes/
$ export BOWTIE2_INDEXES=/data/$USER/htgts/genomes/hg19/bowtie2_indexes/

In order to run the tutorial data successfully, user needs to copy the tutorial data from /usr/local/apps/htgts/tutorial_data and run from personal space such as /data/$USER

On Helix

Sample session:


[helix ~]$ module load htgts
[helix ~]$ cd /data/$USER/tutorial_data
[helix ~]$ TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz
[helix ~]$ TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/ 
.......
.......

TranslocWrapper.pl step will generate hg19.fa.fai and hg19.fa.index files.

Batch job on Biowulf

Create a batch input file (e.g. myjob.sh). For example:

#!/bin/bash
module load htgts
TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz
TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/

Then submit the file on biowulf

sbatch --mem=20g myjob.sh

See https://hpc.nih.gov/docs/userguide.html for more options when submitting jobs.

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. myjob.swarm). For example:

# this file is called myjob.swarm
cd dir1;TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz;TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/
cd dir2;TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz;TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/
cd dir3;TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz;TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/
[...]

Submit this job using the swarm command.

swarm -f myjob.swarm -g 20 --module htgts

See https://hpc.nih.gov/apps/swarm.html for more options when submitting swarm jobs.

Interactive job on Biowulf
Allocate an interactive session and run raremetal. Sample session:
[susanc@biowulf ~]$ sinteractive --mem=20g
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[cn1719]$ module load htgts

[cn1719]$ cd /data/$USER/tutorial_data
[cn1719]$ TranslocPreprocess.pl tutorial_metadata.txt preprocess/ --read1 pooled_R1.fastq.gz --read2 pooled_R2.fastq.gz 
[cn1719]$ TranslocWrapper.pl tutorial_metadata.txt preprocess/ results/
Documentation