phASER on Biowulf

phASER stands for phasing and Allele Specific Expression from RNA-seq. It performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf ~]$ sinteractive -c8 --mem=10g --gres=lscratch:20
salloc.exe: Pending job allocation 63023261
salloc.exe: job 63023261 queued and waiting for resources
salloc.exe: job 63023261 has been allocated resources
salloc.exe: Granted job allocation 63023261
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0873 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn0873 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0873 63023261]$ cp -r /usr/local/apps/phaser/1.1.1/testdata .

[user@cn0873 63023261]$ cd testdata/

[user@cn0873 testdata]$ module load phaser
[+] Loading phaser  1.1.1  on cn0873
[+] Loading singularity  3.6.1  on cn0873

[user@cn0873 testdata]$ phaser.py --vcf NA06986.vcf.gz \
    --bam NA06986.2.M_111215_4.bam --paired_end 1 --mapq 255 --baseq 10 \
    --sample NA06986 --blacklist hg19_hla.bed \
    --haplo_count_blacklist hg19_haplo_count_blacklist.bed --threads 4 \
    --o phaser_test_case

##################################################
              Welcome to phASER v1.1.1
  Author: Stephane Castel (scastel@nygenome.org)
  Updated by: Bishwa K. Giri (bkgiri@uncg.edu)
##################################################

Completed the check of dependencies and input files availability...

STARTED "Read backed phasing and ASE/haplotype analyses" ...
    DATE, TIME : 2020-08-14, 14:14:10
[...snip]
     COMPLETED using 1176416 reads in 481 seconds using 4 threads
     PHASED  23919 of 2142443 all variants (= 0.011164) with at least one other variant
     GENOME WIDE PHASE CORRECTED  1 of 2142443 variants (= 0.000000)
     Global maximum memory usage: 2822.19 (mb)

COMPLETED "Read backed phasing" of sample NA06986 in 00:08:31 hh:mm:ss
DATE, TIME : 2020-08-14, 14:22:42

The End.

[user@cn0873 testdata]$ phaser_gene_ae.py \
    --haplotypic_counts phaser_test_case.haplotypic_counts.txt \
    --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae.txt

##################################################
          Welcome to phASER Gene AE v1.2.0
  Author: Stephane Castel (scastel@nygenome.org)
##################################################

#1 Loading features...
#2 Loading haplotype counts...
sys:1: DtypeWarning: Columns (5,16,17) have mixed types. Specify dtype option on import or set low_memory=False.
#3 Processing results...
    BAM: NA06986.2.M_111215_4
          generating feature level haplotypic counts...
          outputting feature haplotype counts...

[user@cn0873 testdata]$ exit
exit
srun: error: cn0873: task 0: Exited with exit code 130
salloc.exe: Relinquishing job allocation 63023261
salloc.exe: Job allocation 63023261 has been revoked.

[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. phaser.sh). For example:

#!/bin/bash
set -e
module load phaser
phaser.py --vcf NA06986.vcf.gz --bam NA06986.2.M_111215_4.bam --paired_end 1 \
    --mapq 255 --baseq 10  --sample NA06986 --blacklist hg19_hla.bed \
    --haplo_count_blacklist hg19_haplo_count_blacklist.bed --threads 4 \
    --o phaser_test_case

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] phaser.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. phaser.swarm). For example:

phaser_gene_ae.py \
    --haplotypic_counts phaser_test_case.haplotypic_counts1.txt \
    --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae1.txt
phaser_gene_ae.py \
    --haplotypic_counts phaser_test_case.haplotypic_counts2.txt \
    --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae2.txt
phaser_gene_ae.py \
    --haplotypic_counts phaser_test_case.haplotypic_counts3.txt \
    --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae3.txt
phaser_gene_ae.py \
    --haplotypic_counts phaser_test_case.haplotypic_counts4.txt \
    --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae4.txt

Submit this job using the swarm command.

swarm -f phaser.swarm [-g #] [-t #] --module phaser
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module phaser Loads the phaser module for each subjob in the swarm