High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away. It's main features are:

Web site
Reference

Bismark On Helix
back to top

The first step in using Bismark is to prepare a genome using the bismark_genome_preparation command. This creates a subdirectory in the fasta file directory called Bisulfite_Genome. To prevent unnecessary data duplication, the human genome (hg19) has been processed for use with the bowtie2 aligner. It resides under /fdb/bismark. If you would like other genomes to be prepared please contact staff@hpc.nih.gov.

A fastq file with test data is provided by the authors. In this example a user copies this testfile along with X and Y chromosome data from the human genome to thier local space and runs though the basic analysis steps. (User input in bold)

[user@helix ~]$ module load bismark
[+] Loading bismark 0.16.1 on biowulf.nih.gov
[+] Loading samtools 1.3.1 ...
[user@helix tests]$ bismark --version

          Bismark - Bisulfite Mapper and Methylation Caller.

                       Bismark Version: v0.16.1
        Copyright 2010-15 Felix Krueger, Babraham Bioinformatics
              www.bioinformatics.babraham.ac.uk/projects/

[user@helix tests]$ bismark --help

     This program is free software: you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
     the Free Software Foundation, either version 3 of the License, or
     (at your option) any later version.

     This program is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
...
To see full output from this command click here.
[user@helix ~]$ mkdir -p /data/$USER/bismark_test/XandY
[user@helix ~]$ cd /data/$USER/bismark_test
[user@helix bismark_test]$ cp /fdb/genome/human-feb2009/chrX.fa ./XandY
[user@helix bismark_test]$ cp /fdb/genome/human-feb2009/chrY.fa ./XandY
[user@helix bismark_test]$ cp $BISMARK_HOME/test_data.fastq .
[user@helix bismark_test]$ bismark_genome_preparation XandY
Writing bisulfite genomes out into a single MFA (multi FastA) file

Bisulfite Genome Indexer version v0.16.0 (last modified 25 August 2015)


Step I - Prepare genome folders - completed



Total number of conversions performed:

...
To see full output from this command click here.
[user@helix bismark_test]$ bismark XandY test_data.fastq
Path to Bowtie 2 specified as: bowtie2
Output format is BAM (default)
Alignments will be written out in BAM format. Samtools found here: '/usr/local/apps/samtools/1.3.1/bin/samtools'
Reference genome folder provided is XandY/	(absolute path is '/spin1/users/user/bismark_test/XandY/)'
FastQ format assumed (by default)

Files to be analysed:
test_data.fastq

Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)

...
To see full output from this command click here.
[user@helix bismark_test]$ bismark_methylation_extractor test_data_bismark_bt2.bam

 *** Bismark methylation extractor version v0.16.0 ***

Trying to determine the type of mapping from the SAM header line of file test_data_bismark_bt2.bam
Treating file(s) as single-end data (as extracted from @PG line)

Setting core usage to single-threaded (default). Consider using --multicore <int> to speed up the extraction process.

Summarising Bismark methylation extractor parameters:
===============================================================

...
To see full output from this command click here.

Running a single Bismark job on Biowulf
back to top

Set up a batch script along the following lines:

#!/bin/bash
# file called myjob.bat

module load bismark
cd /data/$USER/bismark_test
bismark_genome_preparation XandY
bismark XandY test_data.fastq
bismark_methylation_extractor test_data_bismark_bt2.bam

Submit this job with:

[user@biowulf ~]$ sbatch myjob.bat

For more information on submitting jobs to slurm, see Job Submission in the Biowulf User Guide.

Running a swarm of Bismark jobs on Biowulf
back to top

One strategy for running a swarm of bismark jobs would be to set up multiple directories containing gene sequences. Once you have done that, you can set up a swarm command file containing one line for each of your bismark runs.

Sample swarm command file

# --------file myjobs.swarm----------
bismark directory1 test_data.fastq
bismark directory2 test_data.fastq
bismark directory3 test_data.fastq
....
bismark directoryN test_data.fastq
# -----------------------------------

Submit this set of runs to the batch system by typing

[user@biowulf ~]$ swarm --module bismark -f myjobs.swarm

You could also write swarm files for the genome preparation and methylation extraction steps and submit the jobs with dependencies. For details on using swarm see Swarm on Biowulf.

Documentation
back to top

Bismark is extensively documented. To read the help doc, type bismark --help. See also: