Clair3 is a small variant caller for Illumina, PacBio and ONT long reads. Compare to PEPPER (r0.4), Clair3 (v0.1) shows a better SNP F1-score with ≤30-fold of ONT data (precisionFDA Truth Challenge V2), and a better Indel F1-score, while runs generally four times faster.
Allocate an interactive session and run the program.
Sample session on a GPU node:
[user@biowulf ~]$ sinteractive --gres=gpu:p100:1,lscratch:10 --mem=16g -c4 [user@cn2379 ~]$ module load clair3 [user@cn2379 ~]$ cp -r $CLAIR3_DATA/* . [user@cn2379 ~]$ THREADS=4 [user@cn2379 ~]$ OUTPUT_VCF_FILE_PATH=merge_output.vcf.gzProcessing sample Illumina data
[user@cn2379 ~]$ PLATFORM='ilmn' [user@cn2379 ~]$ INPUT_DIR="Illumina" [user@cn2379 ~]$ cp -r $CLAIR3_DATA/${INPUT_DIR} . [user@cn2379 ~]$ REF="GRCh38_chr20.fa" [user@cn2379 ~]$ BAM="HG003_chr20_demo.bam" [user@cn2379 ~]$ BASELINE_VCF_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark.vcf.gz" [user@cn2379 ~]$ BASELINE_BED_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark_noinconsistent.bed" [user@cn2379 ~]$ clair3 \ --bam_fn=${INPUT_DIR}/${BAM} \ --ref_fn=${INPUT_DIR}/${REF} \ --model_path=${CLAIR3_MODELS}/${PLATFORM} \ --threads=${THREADS} \ --platform=${PLATFORM} \ --output=./ \ --bed_fn=${INPUT_DIR}/${BASELINE_BED_FILE_PATH} ...Processing sample PacBio Hifi data
[user@cn2379 ~]$ PLATFORM='hifi' [user@cn2379 ~]$ INPUT_DIR="PacBio" [user@cn2379 ~]$ cp -r $CLAIR3_DATA/${INPUT_DIR} . [user@cn2379 ~]$ REF="GRCh38_no_alt_chr20.fa" [user@cn2379 ~]$ BAM="HG003_chr20_demo.bam" [user@cn2379 ~]$ BASELINE_VCF_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark.vcf.gz" [user@cn2379 ~]$ BASELINE_BED_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark_noinconsistent.bed" [user@cn2379 ~]$ clair3 \ --bam_fn=${INPUT_DIR}/${BAM} \ --ref_fn=${INPUT_DIR}/${REF} \ --threads=${THREADS} \ --platform=${PLATFORM} \ --model_path=${CLAIR3_MODELS}/${PLATFORM} \ --output=./ \ --bed_fn=${INPUT_DIR}/${BASELINE_BED_FILE_PATH} ...Processing sample ONT data
[user@cn2379 ~]$ PLATFORM='ont' [user@cn2379 ~]$ INPUT_DIR="ONT" [user@cn2379 ~]$ cp -r $CLAIR3_DATA/${INPUT_DIR} . [user@cn2379 ~]$ REF="GRCh38_no_alt_chr20.fa" [user@cn2379 ~]$ BAM="HG003_chr20_demo.bam" [user@cn2379 ~]$ BASELINE_VCF_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark.vcf.gz" [user@cn2379 ~]$ BASELINE_BED_FILE_PATH="HG003_GRCh38_chr20_v4.2.1_benchmark_noinconsistent.bed" [user@cn2379 ~]$ clair3 \ --bam_fn=${INPUT_DIR}/${BAM} \ --ref_fn=${INPUT_DIR}/${REF} \ --threads=${THREADS} \ --platform=${PLATFORM} \ --model_path=${CLAIR3_MODELS}/${PLATFORM} \ --output=./ \ --vcf_fn=${INPUT_DIR}/${BASELINE_VCF_FILE_PATH} ...
Create a batch input file (e.g. clair3.sh). For example:
#!/bin/bash set -e module load Clair3 ... clair3 \ --bam_fn=${INPUT_DIR}/${BAM} \ --ref_fn=${INPUT_DIR}/${REF} \ --threads=${THREADS} \ --platform=${PLATFORM} \ --model_path=${CLAIR3_MODELS}/${PLATFORM} \ --output=./ \ --vcf_fn=${INPUT_DIR}/${BASELINE_VCF_FILE_PATH}
Submit this job using the Slurm sbatch command.
sbatch clair3.sh