High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Cnvkit on NIH HPC Systems

CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from targeted DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.

Example files are under /usr/local/apps/cnvkit/cnvkit-example directory.
To test cnvkit with the example files:

  $ cp -r /usr/local/apps/cnvkit/cnvkit-example /data/$USER
  $ cd /data/$USER/cnvkit-example
  $ sinteractive --mem=5g
  $ module load cnvkit
  $ make
  

The reference genomes are located in
/fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa
/fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa
.....

Sequencing-accessible regions files are under
/usr/local/apps/cnvkit/data

On Helix

Sample session:


[susanc@helix ~]$ module load cnvkit
[susanc@helix ~]$ cnvkit.py -h
usage: cnvkit.py [-h]
                 {batch,target,access,antitarget,autobin,coverage,reference,fix,segment,call,diagram,scatter,heatmap,breaks,gainloss,sex,gender,metrics,segmetrics,import-picard,import-seg,import-theta,export,version}
                 ...

CNVkit, a command-line toolkit for copy number analysis.

positional arguments:
  {batch,target,access,antitarget,autobin,coverage,reference,fix,segment,call,diagram,scatter,heatmap,breaks,gainloss,sex,gender,metrics,segmetrics,import-picard,import-seg,import-theta,export,version}
                        Sub-commands (use with -h for more info)
    batch               Run the complete CNVkit pipeline on one or more BAM
                        files.
    target              Transform bait intervals into targets more suitable
                        for CNVkit.
    access              List the locations of accessible sequence regions in a
                        FASTA file.
    antitarget          Derive a background/antitarget BED file from a target
                        BED file.
    autobin             Quickly calculate reasonable bin sizes from BAM read
                        counts.
    coverage            Calculate coverage in the given regions from BAM read
                        depths.
    reference           Compile a coverage reference from the given files
                        (normal samples).
    fix                 Combine target and antitarget coverages and correct
                        for biases. Adjust raw coverage data according to the
                        given reference, correct potential biases and re-
                        center.
    segment             Infer copy number segments from the given coverage
                        table.
    call                Call copy number variants from segmented log2 ratios.
    diagram             Draw copy number (log2 coverages, CBS calls) on
                        chromosomes as a diagram. If both the raw probes and
                        segments are given, show them side-by-side on each
                        chromosome (segments on the left side, probes on the
                        right side).
    scatter             Plot probe log2 coverages and segmentation calls
                        together.
    heatmap             Plot copy number for multiple samples as a heatmap.
    breaks              List the targeted genes in which a copy number
                        breakpoint occurs.
    gainloss            Identify targeted genes with copy number gain or loss.
    sex                 Guess samples' sex from the relative coverage of
                        chromosomes X and Y.
    metrics             Compute coverage deviations and other metrics for
                        self-evaluation.
    segmetrics          Compute segment-level metrics from bin-level log2
                        ratios.
    import-picard       Convert Picard CalculateHsMetrics tabular output to
                        CNVkit .cnn files. The input file is generated by the
                        PER_TARGET_COVERAGE option in the CalculateHsMetrics
                        script in Picard tools.
    import-seg          Convert a SEG file to CNVkit .cns files.
    import-theta        Convert THetA output to a BED-like, CNVkit-like
                        tabular format. Equivalently, use the THetA results
                        file to convert CNVkit .cns segments to integer copy
                        number calls.
    export              Convert CNVkit output files to another format.
    version             Display this program's version.

optional arguments:
  -h, --help            show this help message and exit

Batch job on Biowulf

Create a batch input file (e.g. script.sh). For example:

#!/bin/bash
module load cnvkit

cd /data/$USER/dir
cnvkit command 1
cnvkit command 2
......

Then submit the file on biowulf

biowulf> $ sbatch script.sh

For more information regarding sbatch command : https://hpc.nih.gov/docs/userguide.html#submit

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;cnvkit command 1; cnvkit command 2
cd dir2;cnvkit command 1; cnvkit command 2
cd dir3;cnvkit command 1; cnvkit command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module cnvkit

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage

Interactive job on Biowulf

Allocate an interactive session. Sample session:

[susanc@biowulf ~]$ sinteractive --mem=5g
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[susanc@cn1719 ~]$ module load cnvkit

[susanc@cn1719 ~]$ cnvkit command 
Documentation