High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ngCGH on Biowulf & Helix

Description

This package contains a tool ( ngCGH) to compute pseudo CGH (comparative genome hybridization) from relative coverage of tumor and normal samples by NGS reads from whole genome or exome sequencing. In addition, helper scripts allow conversion to nexus format for further anaysis.

There may be multiple versions of ngCGH available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail ngCGH 

To select a module use

module load ngCGH/[version]

where [version] is the version of choice.

ngCGH can parallelize over regions. Make sure to match the number of cpus requested with the number of threads (--threads/-t).

Environment variables set

Documentation

On Helix
helix$ module load ngCGH
helix$ ngCGH -h
usage: ngCGH [-h] [-w WINDOWSIZE] [-o OUTFILE] [-l LOGLEVEL] [-f FILTER]
             [-F REQUIRED] [-r REGIONS] [-t PROCESSES]
             normalbam tumorbam

positional arguments:
  normalbam             The name of the bamfile for the normal comparison
  tumorbam              The name of the tumor sample bamfile

optional arguments:
  -h, --help            show this help message and exit
  -w WINDOWSIZE, --windowsize WINDOWSIZE
                        The number of reads captured from the normal sample
                        for calculation of copy number (default: 1000)
  -o OUTFILE, --outfile OUTFILE
                        Output filename, default  (default: None)
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Logging Level, 1-15 with 1 being minimal logging and
                        15 being everything [10] (default: 10)
  -f FILTER, --filter FILTER
                        Like samtools, filter out all reads that are included
                        by this flag value, 0 for unset [0]; hex (0x...),
                        decimal, and octal (e.g., 0777) are accepted (default:
                        0)
  -F REQUIRED, --required REQUIRED
                        Like samtools, include only reads that are included by
                        this flag value, 0 for unset [0]; hex (0x...),
                        decimal, and octal (e.g., 0777) are accepted (default:
                        0)
  -r REGIONS, --regions REGIONS
                        regions to which analysis should be restricted, either
                        a bed file name or a single region in format chrN:XXX-
                        YYY (default: None)
  -t PROCESSES, --threads PROCESSES
                        parallelize over regions (or chromosomes) (default: 1)
helix$ ngCGH -w 2000 -o s1.cgh s1_normal.bam s1_tumor.bam
helix$ head s1.cgh
chr1    4851    52735   1000    854     -0.025120
chr1    52736   59251   1000    812     -0.097876
chr1    59251   119119  1000    876     0.011575
chr1    119120  707038  1000    1087    0.322924
chr1    707040  711128  1000    1016    0.225472
Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is ngcgh.sh
module load ngCGH || exit 1
ngCGH -o s1.cgh s1_normal.bam s1_tumor.bam
convert2nexus s1.cgh > s1.nexus

Submit to the queue with sbatch:

biowulf$ sbatch ngcgh.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example:

# this file is ngcgh.swarm
ngCGH -o s1.cgh s1_normal.bam s1_tumor.bam
ngCGH -o s1.cgh s2_normal.bam s2_tumor.bam
ngCGH -o s1.cgh s3_normal.bam s3_tumor.bam

And submit to the queue with swarm

biowulf$ swarm -f ngcgh.swarm --module ngCGH
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

biowulf$ sinteractive 
node$ module load ngCGH
node$ ngCGH -w 10000 -o cgh normal.bam tumor.bam
node$ exit
biowulf$