High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
OligoArray on Biowulf & Helix

OligoArray is a program that computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction. Selection is based on three major criteria: oligonucleotide melting temperature, specificity to a single target, or at least to the shortest list of possible targets and the inability to fold to form a stable secondary structure at the hybridization temperature. OligoArray was developed by Jean-Marie Rouillard at the University of Michigan. OligoArray website

On Helix

Sample session on Helix. Note that the 'module load OligoArray' command adds the appropriate executables to your PATH, and also sets up an environment variable $OA which points to the location of the jar files and the sample files.

[susanc@helix ~]$ mkdir /data/susanc/OligoArray

[susanc@helix ~]$ cd /data/susanc/OligoArray/

[susanc@helix OligoArray]$ module load OligoArray
[+] Loading OligoArray 2_1 ...

[susanc@helix OligoArray]$ cp $OA/chr1.fas .

[susanc@helix OligoArray]$ cp $OA/yeast_orf.fas .

[susanc@helix OligoArray]$ formatdb  -i yeast_orf.fas -o T -p F

[susanc@helix OligoArray]$ ls
chr1.fas  formatdb.log	yeast_orf.fas  yeast_orf.fas.nhr  yeast_orf.fas.nin  yeast_orf.fas.nsd	yeast_orf.fas.nsi  yeast_orf.fas.nsq

[susanc@helix OligoArray]$ java -Xmx512m -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas -o oligo.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

	***	OligoArray 2.1.3	***

OligoArray 2.1.3 will start to process sequences from the file chr1.fas using the following parameters :
Blast database: 'yeast_orf.fas'
Oligo data will be saved in: 'oligo.txt'
Sequence without oligo will be saved in: 'rejected.fas'
The log file will be: 'OligoArray.log'
Maximum number of oligo to design per input sequence: '2'
Size range: '45' to '47'
Maximum distance between the 5' end of the oligo and the 3' end of the input sequence: '1500'
Minimum distance between the 5' ends of two adjacent oligos: '69'
Tm range: '82' to '88'
GC range: '35.0' to '50.0'
Threshold to reject secondary structures: '65.0'
Threshold to start to consider cross-hybridizations: '65'
Sequence to avoid in the oligo: 'GGGGG;CCCCC;TTTTT;AAAAA'
Number of sequence to run in parallel: '1'

Can OligoArray read/write specified files?  YES
Data initialization: DONE
Is yeast_orf.fas a valid Blast database?  YES
Is OligoArrayAux installed?  YES


Start Blast parameters initialization (It may take a while depending the value entered for the -D option)
Blast parameters initialized

Start YAL069W
[...]
Start YAR071W
Start YAR073W
No more sequence to dispatch
Start YAR075W
OligoArray has successfully processed all sequences

[susanc@helix OligoArray]$
Batch job on Biowulf

The following examples use the sample data that is provided with OligoArray.

Create a batch input file (e.g. oligoarray.sh). For example:

#!/bin/bash
#   this file is called oligoarray.sh

module load OligoArray

cd /data/$USER/OligoArray
java -Xmx2g -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas -o oligo.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

Submit this job using the Slurm sbatch command.

sbatch --mem=2g oligoarray.sh

This job will run on a single node, with 2 CPUs and 2 GB of memory.

Swarm of jobs on Biowulf

To run OligoArray with a large number of input sequences, it is most convenient to use the Biowulf swarm utility.

Create a swarm command file along the following lines:

# this file is called oa.swarm
java -Xmx2g -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas -o oligo1.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"
java -Xmx2g -jar $OA/OligoArray2.jar -i chr2.fas -d yeast_orf.fas -o oligo2.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"
java -Xmx2g -jar $OA/OligoArray2.jar -i chr3.fas -d yeast_orf.fas -o oligo3.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"
[...]

Submit this swarm with:

swarm -g 2 -f oa.swarm

This swarm command will allocate 2 CPUs and 2 GB of memory to each OligoArray run. The example above will put the output files for all the runs into a single directory/

Interactive job on Biowulf

Allocate an interactive session and run OligoArray on there.

Sample session:

[susanc@biowulf ~]$ sinteractive --cpus-per-task=8 --mem=60g
salloc.exe: Pending job allocation 15975374
salloc.exe: job 15975374 queued and waiting for resources
salloc.exe: job 15975374 has been allocated resources
salloc.exe: Granted job allocation 15975374
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1855 are ready for job

[susanc@cn1855 ~]$ cd /data/susanc/OligoArray

[susanc@cn1855 OligoArray]$ java -Xmx512m -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas -o oligo.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

	***	OligoArray 2.1.3	***


[....etc...]


[susanc@cn1855 OligoArray]$ exit
salloc.exe: Relinquishing job allocation 15975374
[susanc@biowulf OligoArray]$
Documentation

OligoArray documentation