High-Performance Computing at the NIH
@nih_hpc RSS Feed
Qcat on NIH HPC Systems

Qcat is Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files. It accepts basecalled FASTQ files and splits the reads into into separate FASTQ files based on their barcode. Qcat makes the demultiplexing algorithms used in albacore/guppy and EPI2ME available to be used locally with FASTQ files. Currently qcat implements the EPI2ME algorithm.

Batch job on Biowulf

Create a batch input file (e.g. qcat.sh). For example:

#!/bin/bash

cd /data/$USER/dir
module load qcat
qcat -f <fastq_file> -b <output directory>

Then submit the file on biowulf

sbatch qcat.sh

Please read user guide for more sbatch options

Useful utilities for job monitoring and debugging

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. qcat.swarm). For example:

# this file is called qcat.swarm
cd dir1; qcat -f <fastq_file> -b <output directory> 
cd dir2; qcat -f <fastq_file> -b <output directory> 
cd dir3; qcat -f <fastq_file> -b <output directory>
[...]

Submit this job using the swarm command.

biowulf >$ swarm -f qcat.swarm --module qcat

More options for swarm job can be viewed here.

Interactive job on Biowulf
Allocate an interactive session. Sample session:
[USER@biowulf ~]$ sinteractive --mem=10g 
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[USER@cn1719 ~]$ module load qcat

[USER@cn1719 ~]$ qcat -f <fastq_file> -b <output directory>
Documentation