xengsort is a fast xenograft read sorter based on space-efficient k-mer hashing.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --cpus-per-task=8 --mem=32g salloc.exe: Pending job allocation 61524097 salloc.exe: job 61524097 queued and waiting for resources salloc.exe: job 61524097 has been allocated resources salloc.exe: Granted job allocation 61524097 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3137 are ready for job srun: error: x11: no local DISPLAY defined, skipping [user@cn3137 ~]$ cd /data/$USER [user@cn3137 user]$ git clone https://gitlab.com/genomeinformatics/xengsort.git Cloning into 'xengsort'... remote: Enumerating objects: 226, done. remote: Counting objects: 100% (226/226), done. remote: Compressing objects: 100% (103/103), done. remote: Total 226 (delta 139), reused 183 (delta 117), pack-reused 0 Receiving objects: 100% (226/226), 108.80 KiB | 0 bytes/s, done. Resolving deltas: 100% (139/139), done. [user@cn3137 user]$ cd xengsort/ [user@cn3137 xengsort]$ module load xengsort snakemake [+] Loading xengsort 28762aac on cn3137 [+] Loading singularity 3.5.3 on cn3137 [+] Loading snakemake 5.19.3 [user@cn3137 xengsort]$ snakemake -j 8 Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 8 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 build_index 1 classify_mouse_exomes 2 download_mouse_exomes 1 download_refs 6 [Mon Jul 20 14:15:43 2020] rule download_refs: output: ref/Homo_sapiens.GRCh38.dna.toplevel.fa.gz, ref/Mus_musculus.GRCm38.dna.toplevel.fa.gz, ref/Homo_sapiens.GRCh38.cdna.all.fa.gz, ref/Mus_musculus.GRCm38.cdna.all.fa.gz jobid: 5 [Mon Jul 20 14:15:43 2020] rule download_mouse_exomes: output: raw/BALBc-M1-normal_1.fq.gz.1 jobid: 3 wildcards: filename=BALBc-M1-normal_1.fq.gz.1 [Mon Jul 20 14:15:43 2020] rule download_mouse_exomes: output: raw/BALBc-M1-normal_2.fq.gz.1 jobid: 4 wildcards: filename=BALBc-M1-normal_2.fq.gz.1 --2020-07-20 14:15:43-- https://sra-pub-src-1.s3.amazonaws.com/SRR9130497/BALBc-M1-normal_2.fq.gz.1 --2020-07-20 14:15:43-- https://sra-pub-src-1.s3.amazonaws.com/SRR9130497/BALBc-M1-normal_1.fq.gz.1 --2020-07-20 14:15:43-- ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz Resolving dtn05-e0 (dtn05-e0)... 10.1.200.241 Connecting to dtn05-e0 (dtn05-e0)|10.1.200.241|:3128... connected. Proxy request sent, awaiting response... Resolving dtn05-e0 (dtn05-e0)... Resolving dtn05-e0 (dtn05-e0)... 10.1.200.24110.1.200.241 Connecting to dtn05-e0 (dtn05-e0)|10.1.200.241|:3128... Connecting to dtn05-e0 (dtn05-e0)|10.1.200.241|:3128... connected. connected. Proxy request sent, awaiting response... Proxy request sent, awaiting response... 200 OK Length: 5034316763 (4.7G) [application/x-troff-man] Saving to: ‘raw/BALBc-M1-normal_2.fq.gz.1’ 0% [ ] 0 --.-K/s 200 OK Length: 4783409755 (4.5G) [application/x-troff-man] Saving to: ‘raw/BALBc-M1-normal_1.fq.gz.1’ 0% [ ] 33,550,787 32.2MB/s 200 Gatewaying Length: 1107654500 (1.0G) [text/plain] Saving to: ‘ref/Homo_sapiens.GRCh38.dna.toplevel.fa.gz’ [...snip...]
Create a batch input file (e.g. xengsort.sh). For example:
#!/bin/bash set -e cd /data/${USER} git clone https://gitlab.com/genomeinformatics/xengsort.git cd xengsort/ module load xengsort snakemake snakemake -j 8
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] xengsort.sh
Create a swarmfile (e.g. xengsort.swarm). For example:
cd /path/to/snakefile1 && snakemake -j 8 cd /path/to/snakefile2 && snakemake -j 8 cd /path/to/snakefile3 && snakemake -j 8 cd /path/to/snakefile4 && snakemake -j 8
Submit this job using the swarm command.
swarm -f xengsort.swarm [-g #] [-t #] --module xengsort snakemakewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module xengsort | Loads the xengsort module for each subjob in the swarm |