High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ngsutils on Biowulf & Helix

NGSUtils is a suite of software tools for working with next-generation sequencing datasets. NGSUtils is made up of 50+ programs, mainly written in Python. These are separated into modules based on the type of file that is to be analyzed. There are four modules: Each of these modules contains many commands for manipulating, filtering, converting, or analyzing these types of files.

NGSutils was developed at Indiana U. & Stanford School of Medicine. [NGSutils website]

Running NGSutils on Helix

The following sample command will split a fastq file into 20 chunks.

helix% module load ngsutils

helix% fastqutils split myfile.fastq.gz out_template 20

Running an NGSutils job on Biowulf

Set up a batch script along the following lines:

# this file is called myjob.bat
module load ngsutils
cd /data/$USER/mydir
fastqutils split myfile.fastq.gz out_template 20

Submit this job with:

qsub -l nodes=1 myjob.bat

Submitting a swarm of NGSutils jobs on Biowulf

Set up a swarm command file along the following lines:

fastqutils split file1.fastq.gz out1 20
fastqutils split file2.fastq.gz out2 20
fastqutils split file3.fastq.gz out3 20
Submit with:
swarm -f swarmfile


Documentation at the ngsutils website.