High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Hiseq on Biowulf & Helix

HiSeq Analysis Software provides rapid and easy alignment and variant calling for Whole Human Genomes or libraries prepared with the Nextera Rapid Capture (NRC) exome enrichment kit. For Whole Human Genome Sequencing, HiSeq Analysis Software features the Isaac analysis workflow, which is the fastest accurate sequence analysis software, providing a 4-6 times speed increase over existing methods. For NRC analysis, the BWA alignment and GATK variant calling methods are used. The software can be run through the command line or through a Graphical user interface called Analysis Visual Controller Software (AVC). More details on the supported workflows:

Feedback from our user:

A job was submitted to a 24gb node and finished successfully.

In this job :
- includes 48 samples from one HiSeq1000 run.
- the primary input files are ~150 – 200 GB
- After the generation of fastq files, the Intensities/L00X and the BaseCalls/L00X files can be deleted to save disk space.
- It is estimated that the run, unattended, would have needed 1 TB of diskspace. The fastq files amount to ~170 GB and the output in the “Alignment” directory (.bam files and .vcf being the most relevant for downstream analyses) amount to another ~170 GB. So say a total of ~350 GB.

Running a single batch job on Biowulf

NOTE: example files can be downloaded into user's personal area then run test job.

$ mkdir /data/$USER/hiseq
$ cd /data/$USER/hiseq 
$ cp /usr/local/apps/hiseq/NexteraRapidCapture_DemoData.tar.gz . 
$ tar xvfz NexteraRapidCapture_DemoData.tar.gz 
$ cd NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/ 

# this file was downloaded originally from ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NexteraRapidCapture/NexteraRapidCapture_DemoData.tar.gz

Then modify the last line of SampleSheet.csv file to use the pre-downloaded genome in our system.
Or even simplier, simply copy the modified version of SampleSheet.csv from shared area to replace the one in your area:

$ cp -f /usr/local/apps/hiseq/SampleSheet.csv .

1. Create a script file alone the lines below. Note, the following script assume that all the input files are already in place from above.

#!/bin/bash

module load hiseq
cd /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
RunLatest -r /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/L00*
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/BaseCalls/L00*

2. submit the file on biowulf

$ sbatch --mem=30g jobscript

 

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --mem=30g 
salloc.exe: Granted job allocation 16535

cn999$ module load hiseq
cn999$ cd /data/$USER/dir
cn999$ RunLatest -r
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=50g

Documentation

hiseq.pdf