High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
mageck-vispr on Biowulf & Helix


The mageck-vispr toolkit is used to QC, analyse, and visualize results from CRISPR/Cas9 screens. It includes the

The mageck workflow is implemented as a snakemake pipeline and runs automatically. Vispr on the other hand is essentially a web application that will run a temporary server on a compute node and the user will connect to it using a browser on his/her own computer through an ssh tunnel.

There may be multiple versions of mageck-vispr available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail mageck-vispr 

To select a module use

module load mageck-vispr/[version]

where [version] is the version of choice.

mageck-vispr includes several other tools that will be added to your path along with the mageck-vispr tools: snakemake, fastqc and cutadapt. Using this module may override the same tools used via the environment modules.

Environment variables set


Dependencies are loaded automatically. Since mageck-vispr is implemented as a singularity container it cannot be used on helix.



Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as shown below.

Start interactive session and set up the environment

biowulf$ sinteractive --cpus-per-task=8 --mem=16g
salloc.exe: Pending job allocation 31864544
salloc.exe: job 31864544 queued and waiting for resources
salloc.exe: job 31864544 has been allocated resources
salloc.exe: Granted job allocation 31864544
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2692 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

node$ module load mageck-visp
[+] Loading singularity 2.2 on cn2692
[+] Loading mageck-vispr 0.5.3

Copy test data and run mageck pipeline

node$ cp -r /usr/local/apps/${APP}/TEST_DATA/esc-testdata .
node$ cd esc-testdata
node$ mageck-vispr init ./test_workflow --reads \
    reads/ERR376998.subsample.fastq \
    reads/ERR376999.subsample.fastq \
node$ tree test_workflow
  |-- [user   2.9K]  config.yaml
  |-- [user   7.3K]  README.txt
  `-- [user   5.8K]  Snakefile
  0 directories, 3 files

node$ cd test_workflow

Before running the workflow it is necessary to edit the automatically generated config file. The generated file contains many comments. Here is the edited file with the comments stripped for readability:

library: ../yusa_library.csv
species: mus_musculus
assembly: mm10

    genes: true
    update-efficiency: false
    trim-5: 23
    len: 19

        - ../reads/ERR376999.subsample.fastq
        - ../reads/ERR377000.subsample.fastq
        - ../reads/ERR376998.subsample.fastq

        designmatrix: ../designmatrix.txt

Once the config file has been modified to reflect the experimental design, run the pipeline. Note that snakemake is used to run this locally, not by submitting tasks as cluster jobs.

node$ snakemake --cores=$SLURM_CPUS_PER_TASK

Next, start the vispr server for visualization

node$ cd test_workflow
node$ vispr server --port 8888 --host=$(hostname) results/ESC-MLE.vispr.yaml
Loading data.
Starting server.

Open:  go to http://cn2692:8888 in your browser.
Note: Safari and Internet Explorer are currently unsupported.
Close: hit Ctrl-C in this terminal.

On your local workstation, create an ssh tunnel via biowulf to the compute node. Please replace 'cn2692' with the name of your compute node and '8888' with the port used.

workstation$ ssh -L 12345:cn2692:8888 -N biowulf.nih.gov

Then open the browser on your local workstation and point it to http://localhost:12345/. You should see the vispr web application:

vispr web app
Batch job on Biowulf

Create a batch script for an existing config file similar to the following example:

#! /bin/bash
# this file is mageck.batch

module load mageck-vispr || exit 1
cd /path/to/workdir

snakemake --cores=$SLURM_CPUS_PER_TASK

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task=8 --mem=16g mageck.batch