The MEME Suite allows you to:
Meme is cpu-intensive for large numbers of sequences or long sequences and scales well to 128 cores.
Meme motif and GoMo databases are available in /fdb/meme/
[user@biowulf mydir]$ wc -c mini-drosoph.s 506016 mini-drosoph.s
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --ntasks=4 --ntasks-per-core=1 --constraint=x2695 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ cd /data/$USER [user@cn3144 ~]$ module load meme [user@cn3144]$ cp /usr/local/apps/meme/examples/protease-seqs /data/$USER [user@cn3144]$ meme -p $SLURM_NTASKS -text protease-seqs > protease.meme.out IInitializing the motif probability tables for 2 to 7 sites... nsites = 7 Done initializing. SEEDS: highwater mark: seq 6 pos 300 BALANCE: samples 7 residues 1750 nodes 1 residues/node 1750 seqs= 7, min= 185, max= 300, total= 1750 motif=1 SEED WIDTHS: 8 11 15 21 29 41 50 em: w= 50, psites= 7, iter= 0 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
In the example below, we are using the file 'mini-drosoph.s' as the input. This file can be copied from /usr/local/apps/meme/examples. The maxsize parameter will be set to 600000 as described in the Important Notes section above.
Create a batch input file (e.g. meme.sh). For example:
#!/bin/bash set -e cd /data/$USER/ module load meme meme -p $SLURM_NTASKS mini-drosoph.s -oc meme_out -maxsize 600000
Submit this job using the Slurm sbatch command.
sbatch --ntasks=28 --ntasks-per-core=1 --constraint=x2695 --exclusive meme.shwhere
This job will run on the norm (default) partition, which is limited to single-node jobs. It will utilize all the cores on a 28-core norm node. Meme scales well and large meme jobs (maxsize ~500,000) can be submitted on up to 512 cores. A multinode job can be submitted with:
sbatch --partition=multinode --ntasks=512 --ntasks-per-core=1 --constraint=x2695 meme.sh
You would submit a swarm of Meme jobs if you have several input files.
Meme is an MPI program which uses OpenMPI libraries. OpenMPI on Biowulf is built with Slurm support. An MPI program runs a specified number of MPI processes or 'tasks'. The user specifies the number of tasks with '--ntasks=#' on the sbatch command line, and the OpenMPI program automatically gets this number from Slurm and starts up the appropriate number of tasks.
Swarm is intended for single-threaded and multi-threaded applications. When you use the '-t #' (threads per process) flag to swarm, it sets up subjobs with $SLURM_CPUS_PER_TASK=# and allocates # cpus on a single node for each subjob. The Meme MPI program sees this as a single 'task' with #threads, and not as # tasks, and will complain that there are not enough slots available for the MPI processes.
Thus, it is important to add the flag --sbatch '--ntasks=# when submitting a swarm of Meme jobs. You should also use '--ntasks-per-core=1' as most MPI applications run with greater efficiency with only one MPI task on each physical core.
Create a swarmfile (e.g. Meme.swarm). For example:
meme -p $SLURM_NTASKS query1.fa -oc query1.out -maxsize 10000000 meme-chip -meme-p $SLURM_NTASKS query2.fna -oc query2out
Submit this job using the swarm command.
swarm -f swarm.cmd -g 20 --sbatch '--ntasks=4 --ntasks-per-core=1 --constraint=x2695' --module=meme