pirs on Biowulf
pIRS is a profile-based Illumina paired-end Reads Simulator. It can simulate reads from haploid or diploid genomes using precomputed profiles from real data. Alternatively users can generate their own profiles.
References:
- X. Hu et al. pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics 28, 1533-1535 (2012). PubMed | Journal
Documentation
- pirs on GitHub
Important Notes
- Module Name: pirs (see the modules page for more information)
- Read simulation is multithreaded. Please match the number of threads to the number of allocated CPUs
- Example files in
$PIRS_TEST_DATA
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=6g salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load pirs [user@cn3144]$ cp $PIRS_TEST_DATA/chr22.fa .
Haplotype data can be simulated directly using the built in profile.
[user@cn3144]$ pirs simulate -l 100 -x 10 -m 200 -z -S 139347 \ --threads=$SLURM_CPUS_PER_TASK -o haplo_reads chr22.fa [pIRS] Program: pirs (Profile-based Illumina pair-end Reads Simulator) [pIRS] Version: 2.0.0 [pIRS] Author: Jianying Yuan (BGI-Shenzhen) [pIRS] Contact: yuanjianying@genomics.org.cn [pIRS] Compile Date: Oct 25 2019 time: 14:47:51 [pIRS] Current time: Fri Oct 25 17:56:04 2019 [pIRS] Command line: pirs simulate -l 100 -x 10 -m 200 -z -S 139347 --threads=4 -o haplo_reads chr22.fa [pIRS] [pIRS] Loading base-calling profile /usr/local/apps/pirs/2.0.2/share/pirs/Base-Calling_Profiles/humNew.PE100.matrix.gz ...
To get reads for a simulated diploid genome, first generate a second haplotype then simulate reads for both of them together.
[user@cn3144]$ pirs diploid -s 0.001 -R 2 -d 0.00001 -v 0.000001 \ -o chr22 chr22.fa [user@cn3144]$ ls -lh chr22* -rw-r--r-- 1 user staff 50M Oct 25 18:01 chr22.snp.indel.inversion.fa -rw-r--r-- 1 user staff 50M Oct 25 17:42 chr22.fa -rw-r--r-- 1 user staff 24K Oct 25 18:01 chr22.indel.lst -rw-r--r-- 1 user staff 1.4K Oct 25 18:01 chr22.inversion.lst -rw-r--r-- 1 user staff 50M Oct 25 18:01 chr22.snp.indel.inversion.fa -rw-r--r-- 1 user staff 951K Oct 25 18:01 chr22.snp.lst [user@cn3144]$ pirs simulate --diploid -l 100 -x 10 -m 800 -z -S 139347 \ --threads=$SLURM_CPUS_PER_TASK -o diplo_reads \ chr22.fa chr22.snp.indel.inversion.fa [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. pirs.sh), which uses the input file 'pirs.in'. For example:
#!/bin/bash module load pirs/2.0.2 pirs simulate -l 100 -x 10 -m 200 -z -S 139347 \ --threads=$SLURM_CPUS_PER_TASK -o haplo_reads chr22.fa
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=6 --mem=8g pirs.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. pirs.swarm). For example:
pirs simulate -l 100 -x 10 -m 200 -z --threads=$SLURM_CPUS_PER_TASK -o haplo_reads_chr1 chr1.fa pirs simulate -l 100 -x 10 -m 200 -z --threads=$SLURM_CPUS_PER_TASK -o haplo_reads_chr2 chr2.fa pirs simulate -l 100 -x 10 -m 200 -z --threads=$SLURM_CPUS_PER_TASK -o haplo_reads_chr3 chr3.fa
Submit this job using the swarm command.
swarm -f pirs.swarm -g 8 -t 6 --module pirswhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module pirs | Loads the pirs module for each subjob in the swarm |