High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Fastxtoolkit on Biowulf & Helix

Fastx-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

These are the current commands that are available:

fasta_clipping_histogram.pl     fastq_quality_converter  fastx_barcode_splitter.pl                    fastx_quality_stats
fasta_formatter fastq_quality_filter fastx_clipper fastx_renamer
fasta_nucleotide_changer fastq_quality_trimmer fastx_collapser fastx_reverse_complement
fastq_masker fastq_to_fasta fastx_nucleotide_distribution_graph.sh fastx_trimmer
fastq_quality_boxplot_graph.sh fastx_artifacts_filter fastx_nucleotide_distribution_line_graph.sh fastx_uncollapser

Running on Helix

$ module load fastxtoolkit
$ cd /data/$USER/dir
$ fasta_formater -h

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load fastxtoolkit
cd /data/$USER/dir
fastxtoolkit command

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; fastxtoolkit command
  cd /data/$USER/dir2; fastxtoolkit command
  cd /data/$USER/dir2; fastxtoolkit command
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and module fastxtoolkit will be loaded for each command line in the file:

  $ swarm -f swarmfile --module fastxtoolkit

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module fastxtoolkit

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load fastxtoolkit
cn999$ cd /data/$USER/dir
cn999$ fastxtoolkit command
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g

Documentation

http://hannonlab.cshl.edu/fastx_toolkit/commandline.html