High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Penncnv on Biowulf & Helix

PennCNV is a software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.

PennCNV implements a hidden Markov model (HMM) that integrates multiple sources of information to infer CNV calls for individual genotyped samples. It differs form segmentation-based algorithm in that it considered SNP allelic ratio distribution as well as other factors, in addition to signal intensity alone. In addition, PennCNV can optionally utilize family information to generate family-based CNV calls by several different algorithms. Furthermore, PennCNV can generate CNV calls given a specific set of candidate CNV regions, through a validation-calling algorithm.

Penncnv sample file can be copied from:

$ cd /data/$USER/
$ cp -r /usr/local/apps/penncnv/example .

Running on Helix

$ module load penncnv
$ cd /data/$USER/example
$ ./runex.pl --path_detect_cnv detect_cnv.pl 1
$ ./runex.pl --path_detect_cnv detect_cnv.pl 2
$ ./runex.pl --path_detect_cnv detect_cnv.pl 3
$ ./runex.pl --path_detect_cnv detect_cnv.pl 4
$ ./runex.pl --path_detect_cnv detect_cnv.pl 5
$ ./runex.pl --path_detect_cnv detect_cnv.pl 6
$ ./runex.pl --path_visualize_cnv visualize_cnv.pl 7
$ ./runex.pl --path_convert_cnv convert_cnv.pl 8
$ ./runex.pl --path_convert_cnv convert_cnv.pl 9
$ ./runex.pl --path_filter_cnv filter_cnv.pl 10
$ ./runex.pl --path_compare_cnv compare_cnv.pl 11
$ ./runex.pl --path_compare_cnv compare_cnv.pl 12
$ ./runex.pl --path_infer_allele infer_snp_allele.pl 13
$ ./runex.pl --path_infer_allele infer_snp_allele.pl 14

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load penncnv
cd /data/$USER/example
penncnv commands

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example --mem=10g:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; penncnv commands
  cd /data/$USER/dir2; penncnv commands
  cd /data/$USER/dir3; penncnv commands
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and --module will be loaded the required module for each command line in the file:

  $ swarm -f swarmfile --module penncnv

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module penncnv

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load penncnv
cn999$ cd /data/$USER/example
cn999$ penncnv commands
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

http://www.openbioinformatics.org/penncnv/penncnv_examples.html