High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
PyClone on NIH HPC Systems

PyClone is statistical model and software tool designed to infer the prevalence of point mutations in heterogeneous cancer samples. The input data for PyClone consists of a set read counts from a deep sequencing experiment, the copy number of the genomic region containing the mutation and an estimate of tumour content.

Tutorial files are under /usr/local/apps/pyclone/tutorial directory.

matplotlib by default uses an interactive backend and wants to pop up an X11 window which sometimes generate 'can not connect to X server' error. User can change that setting with an matplotlibrc file either in your current directory or in your home directory.

Possible locations:
./matplotlibrc
~/.config/matplotlib/matplotlibrc

To change the backend the rc file has to contain
backend: BACKEND_NAME
where BACKEND_NAME could be (amongst others) Agg or cairo.

For example,
$ cat ~/.config/matplotlib/matplotlibrc
backend: Agg

On Helix

Sample session:


[susanc@helix ~]$ module load pyclone
[susanc@helix ~]$ PyClone -h
usage: PyClone [-h] [--version]
               {analyse,build_mutations_file,cluster,plot_cellular_frequencies,plot_similarity_matrix,plot_multi_sample,build_table}
               ...

positional arguments:
  {analyse,build_mutations_file,cluster,plot_cellular_frequencies,plot_similarity_matrix,plot_multi_sample,build_table}
    analyse             Start a new PyClone analysis.
    build_mutations_file
                        Build a YAML format file with mutation data and states
                        prior to be used for PyClone analysis.
    cluster             Cluster the results of a PyClone analysis.
    plot_cellular_frequencies
                        Plot the posterior densities of the cellular
                        frequencies of the mutations from a PyClone analysis.
    plot_similarity_matrix
                        Plot a heat map of the posterior similarity matrix
                        from a PyClone analysis.
    plot_multi_sample   Plot a parallel coordinates plot for the variant
                        allelic prevalence or cellular prevalence colour code
                        by cluster membership.
    build_table         Build results table which contains cluster ids and
                        (mean) cellular prevalence estimates.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Batch job on Biowulf

Create a batch input file (e.g. pyclone.sh). For example:

#!/bin/bash
module load pyclone

cd /data/$USER/dir
pyclone commands

Then submit the file on biowulf

sbatch pyclone.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. pyclone.swarm). For example:

# this file is called pyclone.swarm
cd dir1;pyclone commands
cd dir2;pyclone commands
cd dir3;pyclone commands
[...]

Submit this job using the swarm command.

swarm -f pyclone.swarm --module pyclone

Interactive job on Biowulf
Allocate an interactive session and run raremetal. Sample session:
[susanc@biowulf ~]$ sinteractive 
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[susanc@cn1719 ~]$ module load pyclone

[susanc@cn1719 ~]$ pyclone commends
Documentation