Biowulf High Performance Computing at the NIH
SOAPdenovo-Trans: a de novo transcriptome assembler for RNA-Seq

SOAPdenovo-Trans is a de novo transcriptome assembler designed specifically for RNA-Seq. Its performance on transcriptome datasets from rice and mouse. It provides higher contiguity, lower redundancy and faster execution than other popular transcriptome assemblers.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
[user@cn3316 ~]$ module load soapdenovo-trans
[+] Loading samtools 0.1.20  ...
[+] Loading SOAPdenovo-Trans 1.04  ...
[user@cn3316 ~]$ SOAPdenovo-Trans  -h

Version 1.04

Usage: SOAPdenovo-Trans  [option]
    pregraph     construction kmer-graph
    contig       eliminate errors and output contigs
    map          map reads to contigs
    scaff        scaffolding
    all          doing all the above in turn
[user@cn3316 ~]$ SOAPdenovo-Trans all

Version 1.04

SOAPdenovo-Trans all -s configFile -o outputGraph [-R -f -S -F] [-K kmer -p n_cpu -d kmerFreqCutoff -e EdgeCovCutoff -M mergeLevel -L minContigLen -t locusMaxOutput -G gapLenDiff]
  -s    <string>     configFile: the config file of reads
  -o    <string>     outputGraph: prefix of output graph file name
  -R    (optional)      output assembly RPKM statistics
  -f    (optional)      output gap related reads for SRkgf to fill gap, [NO]
  -S    (optional)      scaffold structure exists, [NO]
  -F    (optional)      fill gaps in scaffolds, [NO]
  -K    <int>        kmer(min 13, max 31): kmer size, [23]
  -p    <int>        n_cpu: number of cpu for use, [8]
  -d    <int>        kmerFreqCutoff: kmers with frequency no larger than KmerFreqCutoff will be deleted, [0]
  -e    <int>        EdgeCovCutoff: edges with coverage no larger than EdgeCovCutoff will be deleted, [2]
  -M    <int>        mergeLevel(min 0, max 3): the strength of merging similar sequences during contiging, [1]
  -L    <int>        minContigLen: shortest contig for scaffolding, [100]
  -t    <int>        locusMaxOutput: output the number of transcripts no more than locusMaxOutput in one locus, [5]
  -G    <int>        gapLenDiff: allowed length difference between estimated and filled gap, [50]
End the interactive session:
[user@cn3316 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$