Baracus on Biowulf

IsoSeqSim is an Iso-Seq reads simulator for evaluating the performance of Iso-Seq bioinformatics analysis tools

Web site

Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load isoseqsim
[+] Loading isoseqsim  0.2  on cn3144 
[+] Loading singularity  4.2.2  on cn3144 

[user@cn3144 ~]$ isoseqsim -h
usage: isoseqsim [-h] [-v] [-m {normal,fusion,apa,ase,ats}] -g GENOME -a
                 ANNOTATION -o OUTPUT -t TRANSCRIPT --tempdir TEMPDIR
                 [--nbn NBN] [--nbp NBP] [--cpu CPU] --c5 C5 --c3 C3 [--es ES]
                 [--ei EI] [--ed ED] [--fc FC] [--dis_pa DIS_PA]
                 [--dis_tss DIS_TSS] [--vcf VCF] [--id ID] [--tnl TNL]
                 [--tnu TNU] [--tnm TNM] [--tns TNS]
isoseqsim: error: argument -g/--genome is required
[...]

[user@cn3144 ~]$ cd /data/$USER
[user@cn3144 ~]$ git clone https://github.com/yunhaowang/IsoSeqSim.git
[user@cn3144 ~]$ cd IsoSeqSim
[user@cn3144 ~]$ isoseqsim -g example/input/genome.fasta -a example/input/gene_annotation.gtf --c5 utilities/5_end_completeness.PacBio-P6-C4.tab --c3 utilities/3_end_completeness.PacBio-P6-C4.tab -o example/simulated_reads_normal.fa -t example/simulated_transcipt_normal.gpd --tempdir example/temp_normal
### Start analysis at Mon,05 May 2025 13:30:02
## Mode: normal
# Step1: convert gtf to gpd
Start analysis: Mon,05 May 2025 13:30:02
Finish analysis: Mon,05 May 2025 13:30:02
# Step2: generate transcriptome fasta file
Start analysis: Mon,05 May 2025 13:30:03
Finish analysis: Mon,05 May 2025 13:30:03
# Step3: generate expression matrix based on Negative Binomial distribution
Start analysis: Mon,05 May 2025 13:30:03
Finish analysis: Mon,05 May 2025 13:30:03
# Step4: simulate Iso-Seq reads
Start analysis: Mon,05 May 2025 13:30:03
Finish analysis: Mon,05 May 2025 13:30:15
### Finish analysis at Mon,05 May 2025 13:30:15

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. isoseqsim.sh). For example:

#!/bin/bash

module load isoseqsim

cd /data/$USER/IsoSeqSim

isoseqsim -g example/input/genome.fasta \
          -a example/input/gene_annotation.gtf \
          --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \
          --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \
          -o example/simulated_reads_normal.fa \
          -t example/simulated_transcipt_normal.gpd \
          --tempdir example/temp_normal

Submit this job using the Slurm sbatch command.

sbatch [--gres=lscratch:#] [--cpus-per-task=#] [--mem=#] isoseqsim.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. isoseqsim.swarm). For example:

cd /data/$USER/IsoSeqSim1; \
isoseqsim -g input/genome.fasta \
          -a input/gene_annotation.gtf \
          --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \
          --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \
          -o simulated_reads_normal.fa \
          -t simulated_transcipt_normal.gpd \
          --tempdir /lscratch/$SLURM_JOB_ID/temp1
cd /data/$USER/IsoSeqSim2; \
isoseqsim -g input/genome.fasta \
          -a input/gene_annotation.gtf \
          --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \
          --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \
          -o simulated_reads_normal.fa \
          -t simulated_transcipt_normal.gpd \
          --tempdir /lscratch/$SLURM_JOB_ID/temp2
cd /data/$USER/IsoSeqSim3; \
isoseqsim -g input/genome.fasta \
          -a input/gene_annotation.gtf \
          --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \
          --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \
          -o simulated_reads_normal.fa \
          -t simulated_transcipt_normal.gpd \
          --tempdir /lscratch/$SLURM_JOB_ID/temp3

Submit this job using the swarm command.

swarm -f isoseqsim.swarm [--gres=lscratch:#] [-g #] [-t #] --module isoseqsim
where
-gres=lscratch:# Number of Gigabytes of local disk space allocated per process (1 line in the swarm command file)
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module isoseqsim Loads the isoseqsim module for each subjob in the swarm