IsoQuant on Biowulf

Quick Links

IsoQuant is used to analyze long read RNA sequencing data from PacBio or Oxford Nanopore.

IsoQuant allows to reconstruct and quantify transcript models with high precision and decent recall. If the reference annotation is given, IsoQuant also assigns reads to the annotated isoforms based on their intron and exon structure. IsoQuant further performs annotated gene, isoform, exon and intron quantification. If reads are grouped (e.g. according to cell type), counts are reported according to the provided grouping.

References:

Prjibelski, A.D., Mikheenko, A., Joglekar, A. et al. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 41, 915–918 (2023).

Documentation

IsoQuant Main Site

Important Notes

Module Name: isoquant (see the modules page for more information)
Multithreaded. It defaults to 16 CPUs, so please allocate jobs accordingly or reduce the number of threads used by isoquant.
isoquant jobs can be resumed if your jobs run out wall-time

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load isoquant

[user@cn3144 ~]$ isoquant.py --test # run toy test
=== Running in test mode ===
Any other option is ignored
2024-11-19 13:06:49,350 - INFO - Running IsoQuant version 3.6.2
2024-11-19 13:06:49,355 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-11-19 13:06:49,355 - INFO -  === IsoQuant pipeline started ===
2024-11-19 13:06:49,355 - INFO - gffutils version: 0.13
2024-11-19 13:06:49,355 - INFO - pysam version: 0.22.1
2024-11-19 13:06:49,355 - INFO - pyfaidx version: 0.8.1.3
2024-11-19 13:06:49,356 - INFO - Checking input gene annotation
2024-11-19 13:06:49,359 - INFO - Gene annotation seems to be correct
2024-11-19 13:06:49,359 - INFO - Converting gene annotation file to .db format (takes a while)...
...
2024-11-19 13:06:54,286 - INFO - Read assignment statistics
2024-11-19 13:06:54,286 - INFO - ambiguous: 30
2024-11-19 13:06:54,286 - INFO - inconsistent: 93
2024-11-19 13:06:54,286 - INFO - inconsistent_ambiguous: 10
2024-11-19 13:06:54,286 - INFO - noninformative: 1
2024-11-19 13:06:54,286 - INFO - unique: 139
2024-11-19 13:06:54,286 - INFO - unique_minor_difference: 150
2024-11-19 13:06:54,294 - INFO - Processed experiment TEST_DATA
2024-11-19 13:06:54,294 - INFO - Processed 1 experiment
2024-11-19 13:06:54,294 - INFO -  === IsoQuant pipeline finished ===
2024-11-19 13:06:54,294 - INFO -  === TEST PASSED CORRECTLY ===

[user@cn3144 ~]$ cd /data/$USER/analysis

[user@cn3144 ~]$ # real analysis example:

[user@cn3144 analysis]$ isoquant.py -d nanopore \
  --fastq ONT.cDNA.raw.fastq.gz \
  --reference reference.fasta \
  --output output_dir \
  --prefix My_ONT_cDNA

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. isoquant.sh). For example:

#!/bin/bash
set -e
module load isoquant
cd /data/$USER/analysis
isoquant.py -d pacbio_ccs --bam mapped_reads.bam --genedb annotation.db --output output_dir

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] isoquant.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. isoquant.swarm). For example:

isoquant.py -d pacbio_ccs --bam reads1.bam --genedb ann1.db --output out1
isoquant.py -d pacbio_ccs --bam reads2.bam --genedb ann2.db --output out2
isoquant.py -d pacbio_ccs --bam reads3.bam --genedb ann3.db --output out3
isoquant.py -d pacbio_ccs --bam reads4.bam --genedb ann4.db --output out4

Submit this job using the swarm command.

swarm -f isoquant.swarm [-g #] [-t #] --module isoquant

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module isoquant`	Loads the isoquant module for each subjob in the swarm