High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Cufflinks on Biowulf & Helix

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Cufflinks is a collaborative effort between the Laboratory for Mathematical and Computational Biology, led by Lior Pachter at UC Berkeley, Steven Salzberg's group at the University of Maryland Center for Bioinformatics and Computational Biology, and Barbara Wold's lab at Caltech.

Cufflinks is provided under the OSI-approved Boost License

 

The iGenomes is available on helix/biowulf in /fdb/igenomes.

Illumina has provided the RNA-Seq user community with a set of genome sequence indexes (including Bowtie, Bowtie2, and BWA indexes) as well as GTF transcript annotation files called iGenomes. These files can be used with TopHat and Cufflinks to quickly perform expression analysis and gene discovery. The annotation files are augmented with the tss_id and p_id GTF attributes that Cufflinks needs to perform differential splicing, CDS output, and promoter user analysis.

 

Running Cufflinks on Helix

Sample session:

helix$ module load cufflinks

helix$ module list

Currently Loaded Modules:
1) cufflinks/2.2.1 helix$ cufflinks command -----------------------------------------------

Patched version of cufflinks

There is a patched version of cufflinks available:

module load cufflinks/2.2.1_patched

The patch significantly accelerates progress at positions where thousands of mate pairs have the same location . The patched version seems to help when working with the Ensembl human annotation.

Running a single batch job on Biowulf

Set up a batch script along the following lines.

#!/bin/bash 

cd /data/$USER/mydir
module load cufflinks

cufflinks -p $SLURM_CPUS_PER_TASK inputFile

Submit to the batch system with:

$ sbatch --cpus-per-task=4 --mem=10g  myscript

You would, of course, modify these values to the needs of your job.

 

Running a swarm of jobs on Biowulf

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/$USER/mydir1; cufflinks -p $SLURM_CPUS_PER_TASK inputFile
cd /data/$USER/mydir2; cufflinks -p $SLURM_CPUS_PER_TASK inputFile
cd /data/$USER/mydir3; cufflinks -p $SLURM_CPUS_PER_TASK inputFile
[...]   

Submit this job with

$ swarm -f cmdfile -t 4 -g 10 --module cufflinks

-f : swarm file name
-t : threads used. This number will be assigned to $SLURM_CPUS_PER_TASK in the script automatically.
-g : memory in gb required per line of commands in swarm file
--module : module used to setup environmental variables for the job.

Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive --cpus-per-task=4 
      salloc.exe: Granted job allocation 1528

[user@pXXXX]$ cd /data/$USER/myruns

[user@pXXXX]$ module load cufflinks

[user@pXXXX]$ cufflinks command

[user@pXXXX] exit

[user@biowulf]$ 

Documentation

http://cole-trapnell-lab.github.io/cufflinks/tools/