Mandalorion is a pipeline to identify isoforms from full-length cDNA sequencing data.
It takes R2C2/C3POa and/or PacBio/ccs/lima data and defines high confidence isoform consensus sequences and alignments. You can mix and match R2C2/PacBio reads and fasta/fastq files (quality scores are ignored). Mandalorion is not tested for use with regular ONT reads.
This application requires a graphical connection using NX
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task 6 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load mandalorion [user@cn3144 ~]$ Mando.py -p ./ -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -f input.fa -t $SLURM_CPUS_PER_TASK [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. mandalorion.sh). For example:
#!/bin/bash set -e module load mandalorion Mando.py \ -p ./ \ -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf \ -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \ -f input.fa \ -t $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=# [--mem=#] mandalorion.sh
Create a swarmfile (e.g. mandalorion.swarm). For example:
Mando.py -f input1.fa -p sample1 -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -t $SLURM_CPUS_PER_TASK Mando.py -f input2.fa -p sample2 -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -t $SLURM_CPUS_PER_TASK Mando.py -f input3.fa -p sample3 -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -t $SLURM_CPUS_PER_TASK Mando.py -f input4.fa -p sample4 -g /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf -G /fdb/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa -t $SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f mandalorion.swarm [-g #] [-t #] --module mandalorionwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module mandalorion | Loads the Mandalorion module for each subjob in the swarm |