Arriba is a command-line tool for detecting gene fusions in RNA-Seq data. It can also detect other clinically-relevant structural variations such as exon duplications or truncations of genes (i.e., breakpoints in introns and intergenic regions).
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load arriba [+] Loading arriba 1.2.0 on cn3113 [+] Loading singularity 3.5.3 on cn3113 [user@cn3144 ~]$ run_arriba.sh Usage: run_arriba.sh STAR_genomeDir/ annotation.gtf assembly.fa blacklist.tsv read1.fastq.gz read2.fastq.gz threads [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. Arriba.sh). For example:
#!/bin/bash set -e module load arriba run_arriba.sh > arriba.out
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] Arriba.sh
Create a swarmfile (e.g. Arriba.swarm). For example:
run_arriba.sh > Arriba_1.out run_arriba.sh > Arriba_2.out run_arriba.sh > Arriba_3.out run_arriba.sh > Arriba_4.out
Submit this job using the swarm command.
swarm -f Arriba.swarm [-g #] [-t #] --module arribawhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module arriba | Loads the arriba module for each subjob in the swarm |