High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
gffcompare

There may be multiple versions of gffcompare available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail gffcompare

To select a module, type

module load gffcompare/[ver]

where [ver] is the version of choice.

Environment variables set:

On Helix

Sample session:

$ module load gffcompare
[+] Loading gffcompare, version 0.9.8...
$ gffcompare
gffcompare v0.9.8
-----------------------------
Usage:
gffcompare [-r <reference_mrna.gtf> [-R]] [-T] [-V] [-s <seq_path>]
    [-o <outprefix>] [-p <cprefix>] 
    {-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}

 GffCompare provides classification and reference annotation mapping and
 matching statistics for RNA-Seq assemblies (transfrags) or other generic
 GFF/GTF files.
 GffCompare also clusters and tracks transcripts across multiple GFF/GTF
 files (samples), writing matching transcripts (identical intron chains) into
 <outprefix>.tracking, and a GTF file <outprefix>.combined.gtf which 
 contains a nonredundant set of transcripts across all input files (with
 a single representative transfrag chosen for each clique of matching transfrags
 across samples).

 Options:
 -v display gffcompare version (also --version)
 -i provide a text file with a list of (query) GTF files to process instead
    of expecting them as command line arguments (useful when a large number
    of GTF files should be processed)

 -r reference annotation file (GTF/GFF)

 -R for -r option, consider only the reference transcripts that
    overlap any of the input transfrags (Sn correction)
 -Q for -r option, consider only the input transcripts that
    overlap any of the reference transcripts (Precision correction);
    (Warning: this will discard all "novel" loci!)
 -M discard (ignore) single-exon transfrags and reference transcripts
 -N discard (ignore) single-exon reference transcripts

 -s path to genome sequences (optional); this can be either a multi-FASTA
    file or a directory containing single-fasta files (one for each contig);
    repeats must be soft-masked (lower case) in order to be able to classify
    transfrags as repeats

 -e max. distance (range) allowed from free ends of terminal exons of
    reference transcripts when assessing exon accuracy (100)
 -d max. distance (range) for grouping transcript start sites (100)
 -p the name prefix to use for consensus transcripts in the 
    <outprefix>.combined.gtf file (default: 'TCONS')
 -C discard the "contained" transcripts in the .combined.gtf
    (i.e. collapse intron-redundant transcripts across all query files)
 -E discard "contained" transfrags which are intron compatible with larger
    transfrags (discard intron-redundant transfrags within a query file)
 -F discard intron-redundant transfrags unless they only differ at the 3' end
    and share the 5' end (within the same query file)
 -T do not generate .tmap and .refmap files for each input file
 -V verbose processing mode (also shows GFF parser warnings)
 -D (debug mode) enables -V and generates additional files: 
    <outprefix>.Qdiscarded.lst and <outprefix>.missed_introns.gtf
Interactive job on Biowulf

See the Biowulf user guide for interactive jobs.

Batch job on Biowulf

Create a batch input file following the job submission guide using the example commands on this page.

Swarm of Jobs on Biowulf

Create a swarmfile following the swarm guide using the example commands on this page.

Documentation