High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Conifer on Biowulf & Helix

CoNIFER uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes. As exome capture reactions are subject to strong and systematic capture biases between sample batches, we implemented singular value decomposition (SVD) to eliminate these biases in exome data. CoNIFER offers the ability to mix exome sequence from multiple experimental runs by eliminating batch biases. Together with a short read aligner such as mrsFAST which can align reads to multiple locations, CoNIFER can robustly detect rare CNVs and estimate the copy number of duplicated genes up to ~8 copies with current exome capture kits.  According to author Niklas Krumm, current version of CoNIFER has trouble with ChrY, which has to be trimmed. Please remove the chrY probes from the probes files

Example files can be copied from /usr/local/apps/conifer/sampledata.tar.gz

The following example is based on http://conifer.sourceforge.net/quickstart.html

Running on Helix

$ module load conifer
$ cd /data/$USER/dir

# Run CoNIFER analysis step
$ python $CONIFERHOME/conifer.py analyze \
	--probes probes.txt --rpkm_dir RPKM_data/ \
	--output analysis.hdf5 --svd 6 \
	--write_svals singular_values.txt

# Make and Plot calls
$ python $CONIFERHOME/conifer.py call \
	--input analysis.hdf5 --output calls.txt

# Visulize data
$ mkdir call_images
$ python $CONIFERHOME/conifer.py plotcalls --input analysis.hdf5 \ --calls calls.txt --outputdir ./call_images/

Running a single batch job on Biowulf

Create a script file similar to the lines below.

#!/bin/bash

module load conifer
cd /data/$USER/
python $CONIFERHOME/conifer.py .....

Submit the script on biowulf:

$ sbatch jobscript   

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; python $CONIFERHOME/conifer.py.....
  cd /data/$USER/dir2; python $CONIFERHOME/conifer.py.....
  cd /data/$USER/dir3; python $CONIFERHOME/conifer.py.....
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module conifer

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 10 --module conifer

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load conifer
cn999$ cd /data/$USER/dir
cn999$ python $CONIFERHOME/conifer.py.....
cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Documentation

http://conifer.sourceforge.net/quickstart.html