Biowulf High Performance Computing at the NIH
OligoArray on Biowulf

OligoArray is a program that computes gene specific oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction. Selection is based on three major criteria: oligonucleotide melting temperature, specificity to a single target, or at least to the shortest list of possible targets and the inability to fold to form a stable secondary structure at the hybridization temperature.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load OligoArray

[user@cn3144 ~]$  cp $OA/chr1.fas .

[user@cn3144 ~]$   cp $OA/yeast_orf.fas .

[user@cn3144 ~]$   formatdb  -i yeast_orf.fas -o T -p F

[user@cn3144 ~]$   ls
chr1.fas  formatdb.log yeast_orf.fas  yeast_orf.fas.nhr  yeast_orf.fas.nin  yeast_orf.fas.nsd	yeast_orf.fas.nsi  yeast_orf.fas.nsq

[user@cn3144 ~]$   java -Xmx512m -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas \
      -o oligo.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

	***	OligoArray 2.1.3	***

OligoArray 2.1.3 will start to process sequences from the file chr1.fas using the following parameters :
Blast database: 'yeast_orf.fas'
Oligo data will be saved in: 'oligo.txt'
Sequence without oligo will be saved in: 'rejected.fas'
The log file will be: 'OligoArray.log'
Maximum number of oligo to design per input sequence: '2'
Size range: '45' to '47'
Maximum distance between the 5' end of the oligo and the 3' end of the input sequence: '1500'
Minimum distance between the 5' ends of two adjacent oligos: '69'
Tm range: '82' to '88'
GC range: '35.0' to '50.0'
Threshold to reject secondary structures: '65.0'
Threshold to start to consider cross-hybridizations: '65'
Sequence to avoid in the oligo: 'GGGGG;CCCCC;TTTTT;AAAAA'
Number of sequence to run in parallel: '1'

Can OligoArray read/write specified files?  YES
Data initialization: DONE
Is yeast_orf.fas a valid Blast database?  YES
Is OligoArrayAux installed?  YES

Start Blast parameters initialization (It may take a while depending the value entered for the -D option)
Blast parameters initialized

Start YAL069W
Start YAR071W
Start YAR073W
No more sequence to dispatch
Start YAR075W
OligoArray has successfully processed all sequences

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

#   this file is called
set -e

module load OligoArray

cd /data/$USER/OligoArray
java -Xmx2g -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas \
      -o oligo.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. OligoArray.swarm). For example:

# this file is called oa.swarm
java -Xmx2g -jar $OA/OligoArray2.jar -i chr1.fas -d yeast_orf.fas  \
      -o oligo1.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"
java -Xmx2g -jar $OA/OligoArray2.jar -i chr2.fas -d yeast_orf.fas  \
      -o oligo2.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"
java -Xmx2g -jar $OA/OligoArray2.jar -i chr3.fas -d yeast_orf.fas  \
      -o oligo3.txt -n 2 -l 45 -L 47 -D 1500 -t 82 -T 88 -s 65 -x 65 -p 35 -P 50 -m "GGGGG;CCCCC;TTTTT;AAAAA"

Submit this job using the swarm command.

swarm -f OligoArray.swarm [-g #] [-t #] --module OligoArray
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module OligoArray Loads the OligoArray module for each subjob in the swarm