MosaicForecast is a machine learning method that leverages read-based phasing and read-level features to accurately detect mosaic SNVs (SNPs, small indels) from NGS data. It builds on existing algorithms to achieve a multifold increase in specificity.
Phase.py
ReadLevel_Features_extraction.py
Prediction.R
Train_RFmodel.R
PhasingRefine.R
MuTect2-PoN_filter.py
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=2G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load mosaicforecast
[user@cn3144 ~]$ Phase.py
Usage: python Phase.py bam_dir output_dir ref_fasta input_positions(file format:chr pos-1 pos ref alt sample, sep=\t) min_dp_inforSNPs(int) Umap_mappability(bigWig file,k=24) n_threads_parallel sequencing_file_format(bam/cram)
Note:
1. Name of bam files should be "sample.bam" under the bam_dir, and there should be corresponding index files.
2. There should be a fai file under the same dir of the fasta file (samtools faidx input.fa).
3. The "min_dp_inforSNPs" is the minimum depth of coverage of trustworthy neaby het SNPs.
4. Bam file is preferred than cram file, as the program would run much more slowly if using cram format.
[user@cn3144 ~]$ mkdir mosaicforecast_test && cd mosaicforecast
[user@cn3144 ~]$ cp -r ${MOSAIC_TESTDATA:-none}/* .
[user@cn3144 ~]$ Phase.py ./demo/ test_out \
/fdb/GATK_resource_bundle/b37-2.8/human_g1k_v37_decoy.fasta \
./demo/test.input 20 k24.umap.wg.bw 2 bam
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. mosaicforecast.sh). For example:
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=2G
#SBATCH --time=2:00:00
#SBATCH --partition=norm
set -e
module load mosaicforecast
cp -r ${MOSAIC_TESTDATA:-none}/* .
cp -r ${MOSAIC_MODEL:-none}/* .
Prediction.R demo/test.SNP.features models_trained/250xRFmodel_addRMSK_Refine.rds Refine test.SNP.predictions
Submit the job:
sbatch mosaicforecast.sh
Create a swarmfile (e.g. job.swarm). For example:
Prediction.R demo/test.SNP.features models_trained/250xRFmodel_addRMSK_Refine.rds Refine SNP.predictions
Prediction.R demo/test.DEL.features models_trained/deletions_250x.RF.rds Phase DEL.predictions
Submit this job using the swarm command.
swarm -f job.swarm [-g #] --module mosaicforecastwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| --module | Loads the module for each subjob in the swarm |