High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
fastq_screen on Biowulf & Helix

Description

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

There may be multiple versions available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail fastq_screen 

To select a module use

module load fastq_screen/[version]

where [version] is the version of choice.

Environment variables set

 

Test/Configuration

According to this useful video https://www.youtube.com/watch ?v=WqiKPRxHzNU , users can change it using --thread flag.

Test dataset can be copied from : /usr/local/apps/fastq_screen/fastq_screen_test_dataset

and this command:

$ fastq_screen fastq_screen_test_dataset/fqs_test_dataset.fastq.gz

 

Documentation

https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --mem=4g
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load fastq_screen
[+] Loading fastq_screen
node$ # Copy the test data
node$ cp -r /usr/local/apps/fastq_screen/fastq_screen_test_dataset .
node$ 
node$ fastq_screen fastq_screen_test_dataset/fqs_test_dataset.fastq.gz
[...snip...]
node$ exit
biowulf$

 

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is fastqscreen.batch

module load fastqscreen || exit 1

cp -r /usr/local/apps/fastq_screen/fastq_screen_test_dataset .
fastq_screen fastq_screen_test_dataset/fqs_test_dataset.fastq.gz

Submit to the queue with sbatch:

biowulf$ sbatch fastqscreen.batch