The nci-dragen partition as of January 2024 includes one dragen server. It has been funded by NCI/CBIIT until the end of FY 2027
Notes:
nci_dragen_turbo
QOS
or 8h (other users)/staging
disk partition./staging/ref/current
. See table below for
detailsOriginally references were located in /staging/human
. These still exist
but are deprecated since reference versions are tied to dragen versions. References now
are located in /staging/ref/current
. The following unmodified references were
obtained from Illumina:
name | alt_aware | alt_masked | cnv | graph | hla | methylation | methylated_combined | rna |
---|---|---|---|---|---|---|---|---|
chm13_v2-cnv.graph.hla.rna-9-r3.0-1 | False | False | True | True | True | False | False | True |
chm13_v2-cnv.hla.rna-9-r3.0-1 | False | False | True | False | True | False | False | True |
hg19-alt_masked.cnv.graph.hla.rna-9-r3.0-1 | False | True | True | True | True | False | False | True |
hg19-alt_masked.cnv.hla.rna-9-r3.0-1 | False | True | True | False | True | False | False | True |
hg38-alt_masked.cnv.graph.hla.rna-9-r3.0-1 | False | True | True | True | True | False | False | True |
hg38-alt_masked.cnv.hla.rna-9-r3.0-1 | False | True | True | False | True | False | False | True |
hs37d5-cnv.graph.hla.rna-9-r3.0-1 | False | False | True | True | True | False | False | True |
hs37d5-cnv.hla.rna-9-r3.0-1 | False | False | True | False | True | False | False | True |
See also /staging/ref/current/README
The Dragen license is metered. If you do not have access to the nci_dragen_turbo
QOS please don't do more then some test runs without contacting staff@hpc.nih.gov.
License usage can be optimized by creating all needed variation calls for a sample in a single run.
For example, when running with a single bam file, i.e.
--bam-input /staging/${ID}/xxxxx.bam
SNV, CNV, and SV can all be called concurrently in a single run by enabling all three caller flags
Flag | Calls |
---|---|
--enable-variant-caller true | For Germline SNV |
--enable-cnv true | For Germline CNV |
--enable-sv true | For Germline SV |
Regardless of how many of these three flags used in a single run, the license will only be charged once.
The same applies to somatic variant calling (i.e. a run that includes a tumor bam with --tumor-bam-input
.
However, tumor-only and somatic variant calls cannot be combined into a single run. Therrefore, ineffect, a full
tumor-normal run will charge the license for 2 samples (tumor/normal + germline).
Create a batch script similar to the following which aligns RNA-Seq data. Note that for fusion detection a GTF file is required. The Gencode GTF files appear to be compatble with the hg38 references.
#! /bin/bash # set up paths etc source /etc/profile.d/edico.sh RUNPATH=/fdb/app_testdata/fastq/Homo_sapiens RUNFOLDER=SRR24373805 ANALYSIS="/staging/${RUNFOLDER}-$(date +%s)" METRICS=${ANALYSIS}/Results/MetricsOutput.tsv RESULTPATH=${PWD}/${RUNFOLDER}-dragen-results # clean up after run trap 'rm -rf "/staging/${RUNFOLDER}" "${ANALYSIS}"' EXIT cp -r "${RUNPATH}/${RUNFOLDER}" /staging || exit 100 mkdir -p "${ANALYSIS}" || exit 101 genome=/staging/ref/current/hg38-alt_masked.cnv.hla.rna-9-r3.0-1 gtf=/fdb/GENCODE/Gencode_human/release_45/gencode.v45.primary_assembly.annotation.gtf # Running a RNA pipeline with dragen dragen -r $genome \ -1 /staging/${RUNFOLDER}/SRR24373805_1.fastq.gz \ -2 /staging/${RUNFOLDER}/SRR24373805_2.fastq.gz \ -a $gtf \ --output-dir ${ANALYSIS} \ --output-file-prefix RNA_test \ --enable-rna true \ --enable-rna-gene-fusion true \ --RGID rg \ --RGSM sm \ --enable-rna-quantification=true # copy results back to working directory cp -r "${ANALYSIS}" "${RESULTPATH}" || exit 103
And submit with
[user@biowulf]$ sbatch --mem=0 --cpus-per-task=64 --partition nci-dragen --qos=nci_dragen_turbo dragen.sh 12345678
Note that the $ANALYSIS
folder is lager than the input
with Logs_Intermediates
taking up most the space. The script
above could be modified to only transfer a subset of files back to shared
storage. Example output file generated:
[user@biowulf]$ cat ${RESULTPATH}/RNA_test.quant_metrics.csv RNA QUANTIFICATION STATISTICS,,Library orientation,IU RNA QUANTIFICATION STATISTICS,,Total Genes,63187 RNA QUANTIFICATION STATISTICS,,Total Transcripts,252930 RNA QUANTIFICATION STATISTICS,,Coding Genes,21567 RNA QUANTIFICATION STATISTICS,,Median transcript CV coverage,0.49 RNA QUANTIFICATION STATISTICS,,Median 5' coverage bias,0.3889 RNA QUANTIFICATION STATISTICS,,Median 3' coverage bias,0.0844 RNA QUANTIFICATION STATISTICS,,Number of genes with coverage > 1x,17094,27.05 RNA QUANTIFICATION STATISTICS,,Number of genes with coverage > 10x,12504,19.79 RNA QUANTIFICATION STATISTICS,,Number of genes with coverage > 30x,9801,15.51 RNA QUANTIFICATION STATISTICS,,Number of genes with coverage > 100x,5522,8.74 RNA QUANTIFICATION STATISTICS,,Transcript fragments,19861578,89.40 RNA QUANTIFICATION STATISTICS,,Forward transcript fragments,9990254,50.30 RNA QUANTIFICATION STATISTICS,,Ambiguous strand fragments,209080,0.94 RNA QUANTIFICATION STATISTICS,,Unknown transcript fragments,1389372,6.25 RNA QUANTIFICATION STATISTICS,,Intron fragments,596441,2.68 RNA QUANTIFICATION STATISTICS,,Intergenic fragments,114056,0.51 RNA QUANTIFICATION STATISTICS,,Fold coverage of all exons,68.68 RNA QUANTIFICATION STATISTICS,,Fold coverage of introns,0.11 RNA QUANTIFICATION STATISTICS,,Fold coverage of intergenic regions,0.03 RNA QUANTIFICATION STATISTICS,,Fold coverage of coding exons,106.68
Please send questions and comments to staff@hpc.nih.gov