A practical introduction to GATK 4 on Biowulf (NIH HPC)
This tutorial requires a basic understanding of high throughput sequencing, genomics, high performance computing and bash scripting.
The following tools are used in this tutorial:
- GATK 22.214.171.124
- fastp 0.20.1
- bwa 0.7.17
- samtools 1.11
- mosdepth 0.3.0
All are available on Biowulf as
In this tutorial we will analyze a trio from the Coriell CEPH/UTAH 1463 pedigree. The sequencing data is part of the illumina platinum genomes project (Eberle et al. 2017).
For convenience, data for the three individuals used in this tutorial are available
on Biowulf at
/fdb/app_testdata/fastq/Homo_sapiens/platinum_genomes split by
flowcell and lane to make assignment of read groups during alignment easier.
|Individual||EBI accession||Type||Pair count|
To run the whole pipeline, you will need about 700GB in your
/data directory. Please run
checkquota to make sure you have enough disk storage. If not, please request a storage increase here.