Biowulf High Performance Computing at the NIH
cufflinks on Biowulf

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Cufflinks is a collaborative effort between the Laboratory for Mathematical and Computational Biology, led by Lior Pachter at UC Berkeley, Steven Salzberg's group at the University of Maryland Center for Bioinformatics and Computational Biology, and Barbara Wold's lab at Caltech.

Cufflinks is provided under the OSI-approved Boost License

Illumina has provided the RNA-Seq user community with a set of genome sequence indexes (including Bowtie, Bowtie2, and BWA indexes) as well as GTF transcript annotation files called iGenomes. These files can be used with TopHat and Cufflinks to quickly perform expression analysis and gene discovery. The annotation files are augmented with the tss_id and p_id GTF attributes that Cufflinks needs to perform differential splicing, CDS output, and promoter user analysis.

Please note that Cufflinks has entered a low maintenance, low support stage as it is now largely superseded by StringTie which provides the same core functionality (i.e. transcript assembly and quantification), in a much more efficient way.


Important Notes

There is a patched version of cufflinks available:

module load cufflinks/2.2.1_patched

The patch significantly accelerates progress at positions where thousands of mate pairs have the same location . The patched version seems to help when working with the Ensembl human annotation.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load cufflinks
[user@cn3144 ~]$ cufflinks file.sam

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

cd /data/$USER/mydir
module load cufflinks
cufflinks -p $SLURM_CPUS_PER_TASK inputFile

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. cufflinks.swarm). For example:

cd /data/$USER/mydir1; cufflinks -p $SLURM_CPUS_PER_TASK inputFile
cd /data/$USER/mydir2; cufflinks -p $SLURM_CPUS_PER_TASK inputFile
cd /data/$USER/mydir3; cufflinks -p $SLURM_CPUS_PER_TASK inputFile

Submit this job using the swarm command.

swarm -f cufflinks.swarm [-g #] [-t #] --module cufflinks
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module cufflinks Loads the cufflinks module for each subjob in the swarm