IsoSeqSim is an Iso-Seq reads simulator for evaluating the performance of Iso-Seq bioinformatics analysis tools
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load isoseqsim [+] Loading isoseqsim 0.2 on cn3144 [+] Loading singularity 4.2.2 on cn3144 [user@cn3144 ~]$ isoseqsim -h usage: isoseqsim [-h] [-v] [-m {normal,fusion,apa,ase,ats}] -g GENOME -a ANNOTATION -o OUTPUT -t TRANSCRIPT --tempdir TEMPDIR [--nbn NBN] [--nbp NBP] [--cpu CPU] --c5 C5 --c3 C3 [--es ES] [--ei EI] [--ed ED] [--fc FC] [--dis_pa DIS_PA] [--dis_tss DIS_TSS] [--vcf VCF] [--id ID] [--tnl TNL] [--tnu TNU] [--tnm TNM] [--tns TNS] isoseqsim: error: argument -g/--genome is required [...] [user@cn3144 ~]$ cd /data/$USER [user@cn3144 ~]$ git clone https://github.com/yunhaowang/IsoSeqSim.git [user@cn3144 ~]$ cd IsoSeqSim [user@cn3144 ~]$ isoseqsim -g example/input/genome.fasta -a example/input/gene_annotation.gtf --c5 utilities/5_end_completeness.PacBio-P6-C4.tab --c3 utilities/3_end_completeness.PacBio-P6-C4.tab -o example/simulated_reads_normal.fa -t example/simulated_transcipt_normal.gpd --tempdir example/temp_normal ### Start analysis at Mon,05 May 2025 13:30:02 ## Mode: normal # Step1: convert gtf to gpd Start analysis: Mon,05 May 2025 13:30:02 Finish analysis: Mon,05 May 2025 13:30:02 # Step2: generate transcriptome fasta file Start analysis: Mon,05 May 2025 13:30:03 Finish analysis: Mon,05 May 2025 13:30:03 # Step3: generate expression matrix based on Negative Binomial distribution Start analysis: Mon,05 May 2025 13:30:03 Finish analysis: Mon,05 May 2025 13:30:03 # Step4: simulate Iso-Seq reads Start analysis: Mon,05 May 2025 13:30:03 Finish analysis: Mon,05 May 2025 13:30:15 ### Finish analysis at Mon,05 May 2025 13:30:15 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. isoseqsim.sh). For example:
#!/bin/bash module load isoseqsim cd /data/$USER/IsoSeqSim isoseqsim -g example/input/genome.fasta \ -a example/input/gene_annotation.gtf \ --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \ --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \ -o example/simulated_reads_normal.fa \ -t example/simulated_transcipt_normal.gpd \ --tempdir example/temp_normal
Submit this job using the Slurm sbatch command.
sbatch [--gres=lscratch:#] [--cpus-per-task=#] [--mem=#] isoseqsim.sh
Create a swarmfile (e.g. isoseqsim.swarm). For example:
cd /data/$USER/IsoSeqSim1; \ isoseqsim -g input/genome.fasta \ -a input/gene_annotation.gtf \ --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \ --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \ -o simulated_reads_normal.fa \ -t simulated_transcipt_normal.gpd \ --tempdir /lscratch/$SLURM_JOB_ID/temp1 cd /data/$USER/IsoSeqSim2; \ isoseqsim -g input/genome.fasta \ -a input/gene_annotation.gtf \ --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \ --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \ -o simulated_reads_normal.fa \ -t simulated_transcipt_normal.gpd \ --tempdir /lscratch/$SLURM_JOB_ID/temp2 cd /data/$USER/IsoSeqSim3; \ isoseqsim -g input/genome.fasta \ -a input/gene_annotation.gtf \ --c5 utilities/5_end_completeness.PacBio-P6-C4.tab \ --c3 utilities/3_end_completeness.PacBio-P6-C4.tab \ -o simulated_reads_normal.fa \ -t simulated_transcipt_normal.gpd \ --tempdir /lscratch/$SLURM_JOB_ID/temp3
Submit this job using the swarm command.
swarm -f isoseqsim.swarm [--gres=lscratch:#] [-g #] [-t #] --module isoseqsimwhere
-gres=lscratch:# | Number of Gigabytes of local disk space allocated per process (1 line in the swarm command file) |
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module isoseqsim | Loads the isoseqsim module for each subjob in the swarm |