MosaicForecast is a machine learning method that leverages read-based phasing and read-level features to accurately detect mosaic SNVs (SNPs, small indels) from NGS data. It builds on existing algorithms to achieve a multifold increase in specificity.
Phase.py ReadLevel_Features_extraction.py Prediction.R Train_RFmodel.R PhasingRefine.R MuTect2-PoN_filter.py
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=2G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load mosaicforecast [user@cn3144 ~]$ Phase.py Usage: python Phase.py bam_dir output_dir ref_fasta input_positions(file format:chr pos-1 pos ref alt sample, sep=\t) min_dp_inforSNPs(int) Umap_mappability(bigWig file,k=24) n_threads_parallel sequencing_file_format(bam/cram) Note: 1. Name of bam files should be "sample.bam" under the bam_dir, and there should be corresponding index files. 2. There should be a fai file under the same dir of the fasta file (samtools faidx input.fa). 3. The "min_dp_inforSNPs" is the minimum depth of coverage of trustworthy neaby het SNPs. 4. Bam file is preferred than cram file, as the program would run much more slowly if using cram format. [user@cn3144 ~]$ mkdir mosaicforecast_test && cd mosaicforecast [user@cn3144 ~]$ cp -r ${MOSAIC_TESTDATA:-none}/* . [user@cn3144 ~]$ Phase.py ./demo/ test_out \ /fdb/GATK_resource_bundle/b37-2.8/human_g1k_v37_decoy.fasta \ ./demo/test.input 20 k24.umap.wg.bw 2 bam [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. mosaicforecast.sh). For example:
#!/bin/bash #SBATCH --cpus-per-task=2 #SBATCH --mem=2G #SBATCH --time=2:00:00 #SBATCH --partition=norm set -e module load mosaicforecast cp -r ${MOSAIC_TESTDATA:-none}/* . cp -r ${MOSAIC_MODEL:-none}/* . Prediction.R demo/test.SNP.features models_trained/250xRFmodel_addRMSK_Refine.rds Refine test.SNP.predictions
Submit the job:
sbatch mosaicforecast.sh
Create a swarmfile (e.g. job.swarm). For example:
Prediction.R demo/test.SNP.features models_trained/250xRFmodel_addRMSK_Refine.rds Refine SNP.predictions Prediction.R demo/test.DEL.features models_trained/deletions_250x.RF.rds Phase DEL.predictions
Submit this job using the swarm command.
swarm -f job.swarm [-g #] --module mosaicforecastwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module | Loads the module for each subjob in the swarm |