Scramble: a tool for mobile element insertion detection
Scramble is a mobile element insertion (MEI) detection tool. It identifies clusters of soft clipped reads in a BAM file, builds consensus sequences, aligns to representative L1Ta, AluYa5, and SVA-E sequences, and outputs MEI calls.
References:
- Rebecca I. Torene, Kevin Galens, Shuxi Liu, Kevin Arvai, Carlos Borroto, Julie Scuffins,
Zhancheng Zhang, Bethany Friedman, Hana Sroka, Jennifer Heeley, Erin Beaver, Lorne Clarke,
Sarah Neil, Jagdeep Walia, Danna Hull, Jane Juusola, and Kyle Retterer.
Mobile element insertion detection in 89,874 clinical exomes
Genetics in Medicine (2020).
Documentation
Important Notes
- Module Name: Scramble (see the modules page for more information)
- Environment variables set
- SCRAMBLE_HOME Scramble installation directory
- SCRAMBLE_BIN Scramble executable directory
- SCRAMBLE_DATA Scramble sample data directory
- If you are using your own reference file make sure you generate *.nhr, *.nin, and *.nsq files using makeblastd as follows:
module load load ncbi-toolkit
makeblastdb -in file.fasta -input_type fasta -dbtype nucl
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf ~]$ sinteractive --mem=4g salloc.exe: Pending job allocation 56730292 salloc.exe: job 56730292 queued and waiting for resources salloc.exe: job 56730292 has been allocated resources salloc.exe: Granted job allocation 56730292 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3148 are ready for job [user@cn3148 ~]$ module load Scramble [+] Loading singularity 3.5.3 on cn3148 [+] Loading Scramble 0.0.20190211.82c78b9 ...Copy sample data into your current directory:
[user@cn3148 ~]$ cp $SCRAMBLE_DATA/* .You can run Scramble in two different ways.
1) When you execute the scramble command without arguments, a new shell will be opened for you within a Singularity container:
[user@cn3148 ~]$ scramble Singularity>Your environment will change and you will have access to a different set of commands and executables. For example, you can run the command:
Singularity> Rscript --vanilla /app/cluster_analysis/bin/SCRAMble.R \ --out-name ${PWD}/sample.mei.txt \ --cluster-file ${PWD}/sample_cluster.txt \ --install-dir /app/cluster_analysis/bin \ --mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa \ --ref /app/validation/test.fa \ --eval-dels \ --eval-meis \ --no-vcf Running sample: /gpfs/gsfs8/users/apptest2/SCRAMBLE_TEST/sample_cluster.txt Running scramble with options: blastRef : /app/validation/test.fa clusterFile : /gpfs/gsfs8/users/apptest2/SCRAMBLE_TEST/sample_cluster.txt deletions : TRUE indelScore : 80 INSTALL.DIR : /app/cluster_analysis/bin mei.refs : /app/cluster_analysis/resources/MEI_consensus_seqs.fa meis : TRUE meiScore : 50 minDelLen : 50 nCluster : 5 no.vcf : TRUE outFilePrefix : /gpfs/gsfs8/users/apptest2/SCRAMBLE_TEST/sample.mei.txt pctAlign : 90 polyAdist : 100 polyAFrac : 0.75 Useful Functions Loaded Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from ‘package:stats’: IQR, mad, sd, var, xtabs The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min Loading required package: S4Vectors Loading required package: stats4 Attaching package: ‘S4Vectors’ The following object is masked from ‘package:base’: expand.grid Loading required package: IRanges Loading required package: XVector Attaching package: ‘Biostrings’ The following object is masked from ‘package:base’: strsplit Done analyzing l1 Done analyzing sva Done analyzing alu Done analyzing l1 Done analyzing sva Done analyzing alu Sample had 0 MEI(s) Done analyzing MEIs 532 clusters out of 927 were removed due to simple sequence Number of alignments meeting thresholds: 395 Number of best alignments: 0 [1] "Two-End-Deletions: Working on contig 1" [1] "Two-End-Deletions: Working on contig 10" [1] "Two-End-Deletions: Working on contig 11" [1] "Two-End-Deletions: Working on contig 12" [1] "Two-End-Deletions: Working on contig 13" [1] "Two-End-Deletions: Working on contig 14" [1] "Two-End-Deletions: Working on contig 15" [1] "Two-End-Deletions: Working on contig 16" [1] "Two-End-Deletions: Working on contig 17" [1] "Two-End-Deletions: Working on contig 18" [1] "Two-End-Deletions: Working on contig 19" [1] "Two-End-Deletions: Working on contig 2" [1] "Two-End-Deletions: Working on contig 20" [1] "Two-End-Deletions: Working on contig 22" [1] "Two-End-Deletions: Working on contig 3" [1] "Two-End-Deletions: Working on contig 4" [1] "Two-End-Deletions: Working on contig 5" [1] "Two-End-Deletions: Working on contig 6" [1] "Two-End-Deletions: Working on contig 7" [1] "Two-End-Deletions: Working on contig 8" [1] "Two-End-Deletions: Working on contig GL000220.1" [1] "Two-End-Deletions: Working on contig hs37d5" [1] "Two-End-Deletions: Working on contig X" [1] "finished one end dels" Sample had 0 deletions Done analyzing deletions Warning message: In predict.BLAST(bl, seq, BLAST_args = "-dust no") : BLAST did not return a match!Please remember to exit this new shell when you are finished with your session.
Singularity> exit exit [user@cn3148]$
2) Alternatively, you can run the Rscript or other command(s) directly from the Linux shell, but in this case the command(s) must be preceded by scramble. For example:
[user@cn3148 ~]$ scramble Rscript --vanilla /app/cluster_analysis/bin/SCRAMble.R \ --out-name ${PWD}/sample.mei.txt \ --cluster-file ${PWD}/sample_cluster.txt \ --install-dir /app/cluster_analysis/bin \ --mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa \ --ref /app/validation/test.fa \ --eval-dels \ --eval-meis \ --no-vcfExit the interactive shell:
[user@cn3148 ~]$ exit exit salloc.exe: Relinquishing job allocation 49998864 [user@biowulf ~]$