Biowulf High Performance Computing at the NIH
Ascat NGS on Biowulf

AscatNGS contains the Cancer Genome Projects workflow implementation of the ASCAT copy number algorithm for paired end sequencing.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load ascatngs

[user@cn3144 ~]$ ascat.pl

Usage:
    ascat.pl [options]

      Please defined as many of the parameters as possible

      Required parameters

        -outdir       -o    Folder to output result to.
        -tumour       -t    Tumour BAM/CRAM file
        -normal       -n    Normal BAM/CRAM file
        -reference    -r    Reference fasta
        -snp_gc       -sg   Snp GC correction file
        -protocol     -pr   Sequencing protocol (e.g. WGS, WXS)
        -gender       -g    Sample gender (XX, XY, L)
                              For XX/XY see '-gc'
                              When 'L' see '-l'

      Targeted processing (further detail under OPTIONS):
        -process      -p    Only process this step then exit, optionally set -index
        -index        -i    Optionally restrict '-p' to single job
        -limit        -x    Specifying 2 will balance processing between '-i 1 & 2'
                            Must be paired with '-p allele_count'

      Optional parameters
        -genderChr    -gc   Specify the 'Male' sex chromosome: Y,chrY...
        -species      -rs   Reference species [BAM HEADER]
        -assembly     -ra   Reference assembly [BAM HEADER]
        -platform     -pl   Seqeuncing platform [BAM HEADER]
        -minbasequal  -q    Minimum base quality required before allele is used. [20]
        -cpus         -c    Number of cores to use. [1]
                            - recommend max 2 during 'input' process.
        -locus        -l    Using a list of loci, default when '-L' [share/gender/GRCh37d5_Y.loci]
                            - these are loci that will not present at all in a female sample
        -force        -f    Force completion - solution not possible
                            - adding this will result in successful completion of analysis even
                              when ASCAT can't generate a solution.  A default copynumber of 5/2
                              (tumour/normal) and contamination of 30% will be set along with a
                              comment in '*.samplestatistics.csv' to indicate this has occurred.
        -purity       -pu   Purity (rho) setting for manual setting of sunrise plot location
        -ploidy       -pi   Ploidy (psi) setting for manual setting of sunrise plot location
        -noclean      -nc   Finalise results but don't clean up the tmp directory.
                            - Useful when including a manual check and restarting ascat with new pu and pi params.

      Other
        -help         -h    Brief help message
        -man          -m    Full documentation.
        -version      -v    Ascat version number


[user@cn3144 ~]$ ascat.pl -o output -t tumor.bam -n normal.bam -r /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa -snp_gc SnpGcCorrections.tsv -pr wgs -g XX -c $SLURM_CPUS_PER_TASK


[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Note: You need to generate the SnpGcCorrections.tsv (as can be found in: https://github.com/cancerit/ascatNgs/wiki/Convert-SnpPositions.tsv-to-SnpGcCorrections.tsv) or downloaded (https://github.com/cancerit/ascatNgs/wiki/Human-reference-files-from-1000-genomes-VCFs). Generates LogR.txt and BAF.txt, which can be used to generate non-segmented plots (see https://www.crick.ac.uk/peter-van-loo/software/ASCAT)

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. AscatNGS.sh). For example:

#!/bin/bash
module load ascatngs
export GENOME=/fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/
ascat.pl -o output -t tumor.bam -n normal.bam -r $GENOME/genome.fa -snp_gc SnpGcCorrections.tsv -pr wgs -g XX -c $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=16 --mem=30g AscatNGS.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. AscatNGS.swarm). For example:

ascat.pl -o output1 -t tumor1.bam -n normal1.bam ... -c $SLURM_CPUS_PER_TASK
ascat.pl -o output2 -t tumor2.bam -n normal2.bam ... -c $SLURM_CPUS_PER_TASK
ascat.pl -o output3 -t tumor3.bam -n normal3.bam ... -c $SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f AscatNGS.swarm -g 30 -t 16 --module ascatngs
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module ascatngs Loads the Ascat NGS module for each subjob in the swarm