High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Meme on Biowulf & Helix

The MEME Suite allows you to: The Meme Suite was developed at U. Queensland and U. Washington. Meme website.

Meme is cpu-intensive for large numbers of sequences or long sequences and scales well to 128 cores.

Meme motif and GoMo databases are available in /fdb/meme/

meme

Running Meme on Helix

As a rule of thumb, if you run Meme on Helix and get the error:

Dataset too large (> 100000). Rerun with larger -maxsize.
you should run Meme in parallel on Biowulf.

Before running any of the Meme Suite programs, you need to set up the environment with 'module load meme'. This will by default load the latest version of the Meme Suite that is installed, and will give you access to all the programs in the suite. If you wish to see which versions are installed, or want to run a specific version of Meme, you should use the module commands. e.g.
[user@helix ~]$ module avail meme

------------------------- /usr/local/lmod/modulefiles ------------------------
meme/4.9.1 meme/4.10.0

[user@helix ~]$ module load meme/4.10.0

[user@helix ~]$ module list
Currently Loaded Modulefiles:
  1) meme/4.10.0

Sample Meme session (user input in bold):

[user@helix mydir]$ module load meme

[user@helix mydir]$ meme -text protease-seqs > protease.meme.out
Initializing the motif probability tables for 2 to 7 sites...
nsites = 7
Done initializing

seqs=     7, min= 185, max=  300, total=     1750

motif=1
em: w=  50, psites=   7, iter=   0  

[user@helix mydir]$ mast protease.meme.out -text
Writing to file mast.protease.meme.out

[user@helix mydir]$ more mast.protease.meme.out
********************************************************************************
MAST - Motif Alignment and Search Tool
********************************************************************************
        MAST version 3.5.7 (Release date: 2007-12-17 16:56:19 -0800
	(Mon, 17 Dec 2007))
[...]
********************************************************************************
DATABASE AND MOTIFS
********************************************************************************
        DATABASE protease-seqs (peptide)
        Last updated on Wed Feb 20 09:46:42 2008
        Database contains 7 sequences, 1750 residues

        MOTIFS protease.meme.out (peptide)
        MOTIF WIDTH BEST POSSIBLE MATCH
        ----- ----- -------------------
          1    50   VIRRGSTTGTHSGRVTALNATVNYGGGDVVYGMIQTNVCAEPGDSGGPLY
[...]

For a full explanation of the Meme and Mast output, see the MEME website

Running a single Meme job on Biowulf

Your input database should consist of a file containing sequences in fasta format. In the example below, the file is 'mini-drosoph.s'.

Maxsize parameter: The maximum dataset size in characters. Determine the number of characters in your dataset by typing 'wc -c filename'. e.g.

[user@biowulf mydir]$ wc -c mini-drosoph.s 
506016 mini-drosoph.s
For this dataset, the maxsize parameter has to be set to greater than 506,016, so we will use 600000.

Set up a batch script along the lines of the ones below:

Batch script

Create a batch script along the following lines:

----  this file is called meme.batch ---------
#!/bin/bash

module load meme/4.10.0

cd /data/username/mydir

meme mini-drosoph.s  -oc meme_out -maxsize 600000 -p $SLURM_NTASKS

Submit this job with a command along the lines of

sbatch --ntasks=16 --constraint=x2650 --ntasks-per-core=1 --exclusive scriptname
This command will submit the Meme run to 16 cpus.

Notes:

The 'meme' executable was built with MPI, and is therefore a parallel program. To run in single-cpu mode, skip the -p parameter.

Meme scales well, and large meme jobs (maxsize ~500,000) can be submitted on up to 512 cores

The standard output and standard error from the job will appear in the files slurm-JobNum.out. If the job does not seem to be running correctly, check this file for errors.

Swarm of Meme jobs on Biowulf

Meme is an MPI program which uses OpenMPI libraries. OpenMPI on Biowulf is built with Slurm support. An MPI program runs a specified number of MPI processes or 'tasks'. The user specifies the number of tasks with '--ntasks=#' on the sbatch command line, and the OpenMPI program automatically gets this number from Slurm and starts up the appropriate number of tasks.

Swarm is intended for single-threaded and multi-threaded applications. When you use the '-t #' (threads per process) flag to swarm, it sets up subjobs with $SLURM_CPUS_PER_TASK=# and allocates # cpus on a single node for each subjob. The Meme MPI program sees this as a single 'task' with #threads, and not as # tasks, and will complain that there are not enough slots available for the MPI processes.

Thus, it is important to add the flag --sbatch '--ntasks=# when submitting a swarm of Meme jobs. You should also use '--ntasks-per-core=1' as most MPI applications run with greater efficiency with only one MPI task on each physical core.

Sample swarm command file:

meme query1.fa -oc query1.out -maxsize 10000000 -p $SLURM_NTASKS
meme query2.fa -oc query2out -maxsize 10000000 -p $SLURM_NTASKS
meme query3.fa -oc query3.out -maxsize 10000000 -p $SLURM_NTASKS

Submit with:

swarm -f swarm.cmd -g 20 --sbatch '--ntasks=4 --ntasks-per-core=1' --module=meme

Interactive job on Biowulf

Interactive jobs may be run for debugging purposes. There is a time limit on interactive jobs, so if you have a long Meme run, you will probably want to submit as a batch job.

Sample session:

[susanc@biowulf ~]$ sinteractive 
salloc.exe: Granted job allocation 1486
slurm stepprolog here!
                      [susanc@p20 ~]$ cd /data/susanc/meme
[susanc@p20 meme]$ meme mini-drosoph.s -oc meme_out -maxsize 6000000
The output directory 'meme_out' already exists.
Its contents will be overwritten.
Initializing the motif probability tables for 2 to 4 sites...
nsites = 4
Done initializing

seqs=     4, min=12850, max=297266, total=   499297
[...]
[susanc@p20 meme]$ exit
   slurm stepepilog here!
                                                                         salloc.exe: Relinquishing job allocation 1486

[susanc@biowulf ~]

Benchmarks

Walltime for 'meme mini-drosoph.s -oc meme_out -maxsize 600000'.

The graph above indicates that the job scales much better by using the --ntasks-per-core=1 flag. This job could be run on up to 512 cores, based on the scaling from this benchmark. In general, Meme scales well

Documentation