phASER stands for phasing and Allele Specific Expression from RNA-seq. It performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive -c8 --mem=10g --gres=lscratch:20 salloc.exe: Pending job allocation 63023261 salloc.exe: job 63023261 queued and waiting for resources salloc.exe: job 63023261 has been allocated resources salloc.exe: Granted job allocation 63023261 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0873 are ready for job srun: error: x11: no local DISPLAY defined, skipping [user@cn0873 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0873 63023261]$ cp -r /usr/local/apps/phaser/1.1.1/testdata . [user@cn0873 63023261]$ cd testdata/ [user@cn0873 testdata]$ module load phaser [+] Loading phaser 1.1.1 on cn0873 [+] Loading singularity 3.6.1 on cn0873 [user@cn0873 testdata]$ phaser.py --vcf NA06986.vcf.gz \ --bam NA06986.2.M_111215_4.bam --paired_end 1 --mapq 255 --baseq 10 \ --sample NA06986 --blacklist hg19_hla.bed \ --haplo_count_blacklist hg19_haplo_count_blacklist.bed --threads 4 \ --o phaser_test_case ################################################## Welcome to phASER v1.1.1 Author: Stephane Castel (scastel@nygenome.org) Updated by: Bishwa K. Giri (bkgiri@uncg.edu) ################################################## Completed the check of dependencies and input files availability... STARTED "Read backed phasing and ASE/haplotype analyses" ... DATE, TIME : 2020-08-14, 14:14:10 [...snip] COMPLETED using 1176416 reads in 481 seconds using 4 threads PHASED 23919 of 2142443 all variants (= 0.011164) with at least one other variant GENOME WIDE PHASE CORRECTED 1 of 2142443 variants (= 0.000000) Global maximum memory usage: 2822.19 (mb) COMPLETED "Read backed phasing" of sample NA06986 in 00:08:31 hh:mm:ss DATE, TIME : 2020-08-14, 14:22:42 The End. [user@cn0873 testdata]$ phaser_gene_ae.py \ --haplotypic_counts phaser_test_case.haplotypic_counts.txt \ --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae.txt ################################################## Welcome to phASER Gene AE v1.2.0 Author: Stephane Castel (scastel@nygenome.org) ################################################## #1 Loading features... #2 Loading haplotype counts... sys:1: DtypeWarning: Columns (5,16,17) have mixed types. Specify dtype option on import or set low_memory=False. #3 Processing results... BAM: NA06986.2.M_111215_4 generating feature level haplotypic counts... outputting feature haplotype counts... [user@cn0873 testdata]$ exit exit srun: error: cn0873: task 0: Exited with exit code 130 salloc.exe: Relinquishing job allocation 63023261 salloc.exe: Job allocation 63023261 has been revoked. [user@biowulf ~]$
Create a batch input file (e.g. phaser.sh). For example:
#!/bin/bash set -e module load phaser phaser.py --vcf NA06986.vcf.gz --bam NA06986.2.M_111215_4.bam --paired_end 1 \ --mapq 255 --baseq 10 --sample NA06986 --blacklist hg19_hla.bed \ --haplo_count_blacklist hg19_haplo_count_blacklist.bed --threads 4 \ --o phaser_test_case
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] phaser.sh
Create a swarmfile (e.g. phaser.swarm). For example:
phaser_gene_ae.py \ --haplotypic_counts phaser_test_case.haplotypic_counts1.txt \ --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae1.txt phaser_gene_ae.py \ --haplotypic_counts phaser_test_case.haplotypic_counts2.txt \ --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae2.txt phaser_gene_ae.py \ --haplotypic_counts phaser_test_case.haplotypic_counts3.txt \ --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae3.txt phaser_gene_ae.py \ --haplotypic_counts phaser_test_case.haplotypic_counts4.txt \ --features gencode.v19.GRCh37.genes.bed --o phaser_test_case_gene_ae4.txt
Submit this job using the swarm command.
swarm -f phaser.swarm [-g #] [-t #] --module phaserwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module phaser | Loads the phaser module for each subjob in the swarm |