Biowulf High Performance Computing at the NIH
merfin on Biowulf

From the documentation:

Improved variant filtering and polishing via k-mer validation


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=36g --gres=lscratch:100
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load merfin
[user@cn3144]$ # unpack a meryl k-mer database created from PE 250pb illumina reads from the son of a trio
[user@cn3144]$ # with     meryl count k=21 reads.fastq.gz output HG002.k21.meryl
[user@cn3144]$ # followed by excluding kmers with frequency of 1
[user@cn3144]$ tar -xzf ${MERFIN_TEST_DATA:-none}/HG002.k21.gt1.meryl.tar.gz
[user@cn3144]$ cp ${MERFIN_TEST_DATA:-none}/{chr20.fasta.gz,ill.vcf.gz} .
[user@cn3144]$ merfin -filter -sequence chr20.fasta.gz  \
           -memory 34 \
           -threads $SLURM_CPUS_PER_TASK \
           -readmers HG002.k21.gt1.meryl \
           -vcf ill.vcf.gz               \
           -output test.merfin

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g., which uses the input file ''. For example:

module load merfin
tar -xzf ${MERFIN_TEST_DATA:-none}/HG002.k21.gt1.meryl.tar.gz
cp ${MERFIN_TEST_DATA:-none}/{chr20.fasta.gz,ill.vcf.gz} .
merfin -filter -sequence chr20.fasta.gz  \
    -memory 34 \
    -threads $SLURM_CPUS_PER_TASK \
    -readmers HG002.k21.gt1.meryl \
    -vcf ill.vcf.gz               \
    -output test.merfin

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=16 --mem=36g