Ascat NGS on Biowulf

AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive -c4 --mem=8g --gres=lscratch:10
salloc.exe: Pending job allocation 11188180
salloc.exe: job 11188180 queued and waiting for resources
salloc.exe: job 11188180 has been allocated resources
salloc.exe: Granted job allocation 11188180
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0880 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.11188180.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0880 ~]$ module load ascatngs
[+] Loading ascatngs  4.5.0  on cn0880
[+] Loading singularity  3.7.2  on cn0880

[user@cn0880 ~]$ ascat.pl

ERROR: Option must be defined.

Usage:
    ascat.pl [options]

      Please define as many of the parameters as possible

      Required parameters

        -outdir       -o    Folder to output result to.
        -tumour       -t    Tumour BAM/CRAM/counts file (counts must be .gz)
        -normal       -n    Normal BAM/CRAM/counts file (counts must be .gz)
        -reference    -r    Reference fasta
        -snp_gc       -sg   Snp GC correction file
        -protocol     -pr   Sequencing protocol (e.g. WGS, WXS)
        -gender       -g    Sample gender (XX, XY, L, FILE)
                              For XX/XY see '-gc'
                              When 'L' see '-l'
                              FILE - matched normal is_male.txt from ascatCounts.pl

      Targeted processing (further detail under OPTIONS):
        -process      -p    Only process this step then exit, optionally set -index
        -index        -i    Optionally restrict '-p' to single job
        -limit        -x    Specifying 2 will balance processing between '-i 1 & 2'
                            Must be paired with '-p allele_count'

      Optional parameters
        -genderChr    -gc   Specify the 'Male' sex chromosome: Y,chrY...
        -species      -rs   Reference species [BAM HEADER]
        -assembly     -ra   Reference assembly [BAM HEADER]
        -platform     -pl   Seqeuncing platform [BAM HEADER]
        -minbasequal  -q    Minimum base quality required before allele is used. [20]
        -cpus         -c    Number of cores to use. [1]
                            - recommend max 2 during 'input' process.
        -locus        -l    Using a list of loci, default when '-L' [share/gender/GRCh37d5_Y.loci]
                            - these are loci that will not be present at all in a female sample
        -force        -f    Force completion - solution not possible
                            - adding this will result in successful completion of analysis even
                              when ASCAT can't generate a solution.  A default copynumber of 5/2
                              (tumour/normal) and contamination of 30% will be set along with a
                              comment in '*.samplestatistics.csv' to indicate this has occurred.
        -purity       -pu   Purity (rho) setting for manual setting of sunrise plot location
        -ploidy       -pi   Ploidy (psi) setting for manual setting of sunrise plot location
        -noclean      -nc   Finalise results but don't clean up the tmp directory.
                            - Useful when including a manual check and restarting ascat with new pu and pi params.
        -nobigwig     -nb   Don't generate BigWig files.
        -t_name       -tn   Tumour name to use when using count files as input
        -n_name       -nn   Noraml name to use when using count files as input

      Other
        -help         -h    Brief help message
        -man          -m    Full documentation.
        -version      -v    Ascat version number


[user@cn0880 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0880 11188180]$ cp /path/to/tumor.bam .

[user@cn0880 11188180]$ cp /path/to/normal.bam .

[user@cn0880 11188180]$ ascat.pl -o output -t tumor.bam -n normal.bam \
     -r /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
     -snp_gc SnpGcCorrections.tsv -pr wgs -g XX -c $SLURM_CPUS_PER_TASK
[snip...]

[user@cn0880 11188180]$ exit
exit
srun: error: cn0880: task 0: Exited with exit code 2
salloc.exe: Relinquishing job allocation 11188180

[user@biowulf ~]$

Note: You need to generate the SnpGcCorrections.tsv (as can be found in: https://github.com/cancerit/ascatNgs/wiki/Convert-SnpPositions.tsv-to-SnpGcCorrections.tsv) or downloaded (https://github.com/cancerit/ascatNgs/wiki/Human-reference-files-from-1000-genomes-VCFs). Generates LogR.txt and BAF.txt, which can be used to generate non-segmented plots (see https://www.crick.ac.uk/peter-van-loo/software/ASCAT)

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. AscatNGS.sh). For example:

#!/bin/bash
module load ascatngs
export GENOME=/fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/
ascat.pl -o output -t tumor.bam -n normal.bam -r $GENOME/genome.fa -snp_gc SnpGcCorrections.tsv -pr wgs -g XX -c $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=16 --mem=30g AscatNGS.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. AscatNGS.swarm). For example:

ascat.pl -o output1 -t tumor1.bam -n normal1.bam ... -c $SLURM_CPUS_PER_TASK
ascat.pl -o output2 -t tumor2.bam -n normal2.bam ... -c $SLURM_CPUS_PER_TASK
ascat.pl -o output3 -t tumor3.bam -n normal3.bam ... -c $SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f AscatNGS.swarm -g 30 -t 16 --module ascatngs
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module ascatngs Loads the Ascat NGS module for each subjob in the swarm