High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
gossamer

a memory efficient tool for de novo assembly of high throughput sequencing data

Gossamer is an application for doing de novo assembly of high throughput sequencing data. It is a memory efficient assembler based on the de Bruijn graph.

The advantage of Gossamer is that large data sets can be assembled on computers with small amounts of memory. The fundamental parameter to de Bruijn graph based methods is k, the size of substrings used in the construction of the graph. These substrings are referred to as k-mers and correspond to nodes in the graph. Edges in the graph correspond to (k+1)-mers which are called rho-mers. It should be noted that larger values of k will require larger memory sizes because the size of the rho-mer space is larger. Another point to note is that by using a small cluster of small memory computers the time can be easily and substantially reduced by building subsections of the graph in parallel.

Input files are base-space reads in FASTA or FASTQ format or in a format with one read per line and in either plain text or compressed format (i.e. gzip).

Web site
Reference

Gossamer interactive session
back to top

Gossamer runs in a Singularity container and is not appropriate for use on Helix. To run an interactive session, use the sinteractive command to initiate an interactive session. Then use Gossamer like so. (User input in bold.)

[user@biowulf ~]$ sinteractive -c 8 --mem=10g
salloc.exe: Pending job allocation 30034501
salloc.exe: job 30034501 queued and waiting for resources
salloc.exe: job 30034501 has been allocated resources
salloc.exe: Granted job allocation 30034501
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2686 are ready for job

[user@helix ~]$ module load gossamer
[+] Loading gossamer ac492a8 on cn2686
[+] Loading singularity 2.2 on cn2686

[user@helix ~]$ goss -h
goss commands:
	annotate-kmers              	Decorate a graph with an assignment of kmers to graphs.
	build-db                    	produce a database of contig, and optionally link, information
	build-edge-index            	build an index for aligning pairs to the graph
	build-entry-edge-set        	build an entry edge set for a graph
	build-graph                 	create a new graph
	build-kmer-set              	create a new graph
	build-scaffold              	build a scaffold graph from a pair library
	build-subgraph              	generate a subgraph of an existing graph
	build-supergraph            	generate a de Bruijn graph's supergraph
	clip-links                  	create a new graph by removing spurious links
	compute-near-kmers          	Decorate a graph with an assignment of kmers to graphs.
	count-components            	count connected components in the graph represented by the given reads
	detect-variants             	Decorate a graph with an assignment of kmers to graphs.
	dot-graph                   	write out the graph in dot format.
	dot-supergraph              	write out the supergraph in dot format.
	dump-graph                  	write out the graph in a robust text representation.
	dump-kmer-set               	write out the graph in a robust text representation.
	estimate-errors             	Decorate a graph with an assignment of kmers to graphs.
	extract-core-genome         	Decorate a graph with an assignment of kmers to graphs.
	extract-reads               	extract reads which map on to a graph
	filter-reads                	filter reads keeping/discarding those that coincide with a graph.
	fix-reads                   	read error correction
	graph-to-kmer-set           	generate a graph's k-mer set
	help                        	print a summary of all the goss commands
	intersect-kmer-sets         	generate the intersection of the given k-mer sets
	lint-graph                  	verify that a graph structure is internally consistent
	merge-and-annotate-kmer-sets	Decorate a graph with an assignment of kmers to graphs.
	merge-graphs                	create a new graph by merging zero or more existing graphs
	merge-kmer-sets             	create a new graph by merging zero or more existing graphs
	pool-samples                	pool all the samples
	pop-bubbles                 	perform a bubble-popping pass over the graph
	print-contigs               	print all the non-branching paths in the given assembly graph
	prune-tips                  	create a new graph by removing low frequency tips
	restore-graph               	read in a graph from a robust text representation.
	scaffold                    	apply a scaffold to a supergraph
	subtract-kmer-set           	subtract the second k-mer set from the first
	thread-pairs                	generate a de Bruijn graph's supergraph.
	thread-reads                	thread reads through the supergraph.
	trim-graph                  	create a new graph by trimming low frequency edges
	trim-paths                  	create a new graph by removing low frequency paths

[user@helix ~]$ electus -h
electus commands:
	classify	classify reads according to reference
	help    	print a summary of all the electus commands
	index   	build an index for classifying reads

[user@helix ~]$ gossple -h
usage: $0 [options and files]....
  -B <buffer-size>        amount of buffer space to use in GB (default 2).
  -c <coverage>           coverage estimate to use during pair/read threading.
                          (defaults to using automatic coverage estimation)
[...snip...]

[user@helix ~]$ xenome -h
xenome commands:
	classify	classify reads according to index
	help    	print a summary of all the xenome commands
	index   	build an index for classifying reads

Running a single Gossamer job on Biowulf
back to top

Set up a batch script along the following lines:

#!/bin/bash
# file called myjob.bat

module load gossamer
mkdir /data/$USER/goss_output
cd /data/$USER/goss_output
goss build-graph -k 27 -i /path/to/some.fastq -O graph

Submit this job with:

[user@biowulf ~]$ sbatch myjob.bat

For more information on submitting jobs to slurm, see Job Submission in the Biowulf User Guide.

Running a swarm of Gossamer jobs on Biowulf
back to top

Sample swarm command file

# --------file myjobs.swarm----------
goss build-graph -k 27 -i /path/to/1.fastq -O graph
goss build-graph -k 27 -i /path/to/2.fastq -O graph
goss build-graph -k 27 -i /path/to/3.fastq -O graph
....
goss build-graph -k 27 -i /path/to/N.fastq -O graph
# -----------------------------------

Submit this set of runs to the batch system by typing

[user@biowulf ~]$ swarm --module gossamer -f myjobs.swarm

For details on using swarm see Swarm on Biowulf.

Documentation
back to top