PyLOH integrates somatic copy number alterations and loss of heterozygosity in tumor/somatic sample pairs to deconvolve the contributions of tumor sample purity and tumor ploidity. It optionally takes a genome segmentation created by one of the automated segmentation packages in bed format as an optional input in addition to the tumor and somatic bam files.
PyLOH contains three modules:
-
preprocess
: Preprocess the tumor-normal bam pair to create paired count files and processed segmentation files. This uses either whole chromosomes as segments or takes a segmentation of the genome produced by an external tool. -
run_model
: Using output from the first step, estimates tumor purity and the copy number of each segment -
BAF_heatmap
: Produce heatmap visualization of the B allele frequencies (BAF) in tumor and normal samples for each segment.
References:
- Yi Li and Xiaohui Xie. Deconvolving tumor purity and ploidy by integrating copy number alterations and loss of heterozygosity. Bioinformatics 2014, 30:2121-2129. Pubmed | PMC | Journal
- Module Name: pyloh (see the modules page for more information)
- PyLOH preprocessing can be run using multiple processes. Make sure to match the number of cpus requested with the number of processes.
- Example files in
$PYLOH_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=6g --cpus-per-task=6 --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load pyloh [user@cn3144]$ cp -a ${PYLOH_TEST_DATA:-none}/aln . [user@cn3144]$ REFERENCE_GENOME=/fdb/GATK_resource_bundle/hg18/Homo_sapiens_assembly18.fasta [user@cn3144]$ PyLOH.py preprocess \ --segments_bed aln/segments.bed \ --min_depth 20 \ --min_base_qual 10 \ --min_map_qual 10 \ --process_num $SLURM_CPUS_PER_TASK \ $REFERENCE_GENOME aln/normal.bam aln/tumor.bam TvN [...snip...] [user@cn3144]$ PyLOH.py run_model --allelenumber_max 2 --max_iters 100 --stop_value 1e-7 TvN [...snip...] [user@cn3144]$ #set the matplotlib backend to a non-interactive backend [user@cn3144]$ MPLBACKEND=agg PyLOH.py BAF_heatmap TvN
Below are two examples of BAF heatmaps for segments with a copy number of 1 (left) and 2 (right), respectively.
![]() |
![]() |
The test data used here is simulated based on 90% tumor purity. The tumor purity estimated by PyLOH is
[user@cn3144]$ cat TvN.PyLOH.purity Optimum baseline copy number : 2 Maximum copy number of each allele : 2 Tumor purity : 0.901
[user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Preprocessing
Create a batch script similar to the following example to run the
preprocessing step. Note that this batch script uses the example data
from $PYLOH_TEST_DATA
#! /bin/bash #SBATCH --mail-type BEGIN,END,FAIL,TIME_LIMIT_90 module load pyloh/1.4.3 || exit 1 REFERENCE_GENOME=/fdb/GATK_resource_bundle/hg18/Homo_sapiens_assembly18.fasta if [[ ! -d aln ]]; then cp -a $PYLOH_TEST_DATA/aln . || exit 1 fi PyLOH.py preprocess \ --segments_bed aln/segments.bed \ --min_depth 20 \ --min_base_qual 10 \ --min_map_qual 10 \ --process_num $SLURM_CPUS_PER_TASK \ $REFERENCE_GENOME aln/normal.bam aln/tumor.bam TvN
Submit to the queue with sbatch:
sbatch --mem=4g --cpus-per-task=10 --time=06:00:00 preprocess.sh
Run model
Create a batch script similar to the following example to run the model building step. Note that this script uses the output from the previous script
#! /bin/bash #SBATCH --mail-type BEGIN,END,FAIL,TIME_LIMIT_90 module load pyloh/1.4.3 || exit 1 PyLOH.py run_model \ --allelenumber_max 2 \ --max_iters 100 \ --stop_value 1e-7 \ TvN
Submit to the queue with sbatch:
sbatch --mem=4G run_model.sh
Create the BAF heatmap graph
This step makes use of the python package matplotlib
to
draw heat maps of BAF in tumor vs normal samples for each segment. Since
matplotlib defaults to an interactive backend, we create a matplotlibrc
file that switches to a non-interactive backend (Agg) for this script
to work in batch mode.
Create a batch script similar to the following example to run the BAF heatmap plotting step.
#! /bin/bash #SBATCH --mail-type BEGIN,END,FAIL,TIME_LIMIT_90 module load pyloh/1.4.3 || exit 1 if [[ ! -e matplotlibrc ]]; then echo "backend: Agg" > matplotlibrc fi PyLOH.py BAF_heatmap TvN rm matplotlibrc
Submit to the queue with sbatch:
sbatch --mem=4G baf_heatmap.sh
Create a swarmfile (e.g. PyLOH.swarm). For example:
PyLOH.py preprocess \ /path/to/genome/reference.fasta \ aln/normal1.bam \ aln/tumor1.bam \ TvN1 \ --segments_bed aln/segments1.bed \ --min_depth 20 \ --min_base_qual 10 \ --min_map_qual 10 \ --process_num $SLURM_CPUS_PER_TASK PyLOH.py preprocess \ /path/to/genome/reference.fasta \ aln/normal2.bam \ aln/tumor2.bam \ TvN2 \ --segments_bed aln/segments2.bed \ --min_depth 20 \ --min_base_qual 10 \ --min_map_qual 10 \ --process_num $SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f PyLOH.swarm -g 4 -t 10 --module PyLOHwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module PyLOH | Loads the PyLOH module for each subjob in the swarm |