PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data.
On our systems, only the main program prinseq-lite.pl is set up for use. Please contact the HPC staff if you require more of PRINSEQ's functionality.
There may be multiple versions of prinseq available. An easy way of selecting the version is to use modules. To see the modules available, type
module avail prinseq
To select a module, type
module load prinseq/[ver]
where [ver] is the version of choice.
Sample session:
module load prinseq prinseq-lite.pl -verbose -fastq $PRINSEQ_HOME/example/example1.fastq -ns_max_n 0 -out_good test_no_ns -out_bad test_with_ns
See the Biowulf user guide for interactive jobs.
Create a batch input file (e.g. prinseq.sh), which uses the input file 'prinseq.in'. For example:
#!/bin/bash module load prinseq prinseq-lite.pl -verbose -fastq $PRINSEQ_HOME/example/example1.fastq -ns_max_n 0 -out_good test_no_ns -out_bad test_with_ns
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=1 prinseq.sh
Create a swarmfile following the swarm guide using the example commands on this page.