High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Bam2Fastq on Helix & Biowulf

BAM-format files are used to store alignment information and unaligned reads from next-generation sequencing machines. This tool is intended to extract raw sequences (with qualities) from a BAM file.

 

Submitting a helix job

module load bam2fastq
cd /data/$USER/somewhereWithInputFile
bam2fastq --aligned -o myfile.fastq  myfile.bam

Submitting a single batch job

1. Create a batch script along the following lines.

#!/bin/bash 
echo "Running on $SLURM_CPUS_PER_TASK cores"

env

module load bam2fastq
cd /data/$USER/somewhereWithInputFile
bam2fastq --aligned -o myfile.fastq  myfile.bam

2. Submit the script on Biowulf

$ sbatch  myscript

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

bam2fastq /data/$USER/run1/myfile.bam
bam2fastq /data/$USER/run2/myfile.bam
bam2fastq /data/$USER/run3/myfile.bam
[.....]

Submit on biowulf:

 swarm -f swarmfile --module bam2fastq

-f: specify swarm file.
--module: setup bam2fastq environmental variables for each swarm command

Sometimes, bam2fastq may require that input files contain the hash or pound character '#'. If this is the case, please also include the swarm option --no-comment. Otherwise, swarm will interpret anything following the '#' character as a comment and will delete it.

If more memory is required, use -g flag:
$ swarm -g 10 --module bam2fastq -f cmdfile

For more information regarding running swarm, see swarm.html

 

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf $ sinteractive
[user@pXXXX]$ cd /data/$USER/myruns
[user@pxxx]$ module load bam2fastq
[user@pxxx]$ bam2fastq --aligned -o myfile.fastq myfile.bam
[user@pxxx] exit

The command 'sinteractive' has several options:

 $ sinteractive -h

 

Documentation

http://gsl.hudsonalpha.org/information/software/bam2fastq