High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed

PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data.

On our systems, only the main program prinseq-lite.pl is set up for use. Please contact the HPC staff if you require more of PRINSEQ's functionality.


There may be multiple versions of prinseq available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail prinseq

To select a module, type

module load prinseq/[ver]

where [ver] is the version of choice.

Environment variables set:

On Helix

Sample session:

module load prinseq
prinseq-lite.pl -verbose -fastq $PRINSEQ_HOME/example/example1.fastq -ns_max_n 0 -out_good test_no_ns -out_bad test_with_ns
Interactive job on Biowulf

See the Biowulf user guide for interactive jobs.

Batch job on Biowulf

Create a batch input file (e.g. prinseq.sh), which uses the input file 'prinseq.in'. For example:

module load prinseq
prinseq-lite.pl -verbose -fastq $PRINSEQ_HOME/example/example1.fastq -ns_max_n 0 -out_good test_no_ns -out_bad test_with_ns

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=1 prinseq.sh
Swarm of Jobs on Biowulf

Create a swarmfile following the swarm guide using the example commands on this page.