High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Affymetrix Power Tools (APT) on Biowulf & Helix

APT, developed by Affymetrix, are a set of cross-platform command line programs that implement algorithms for analyzing and working with Affymetrix GeneChip arrays. APT programs are intended for "power users who prefer programs that can be utilized in scripting environments and are sophisticated enough to handle the complexity of extra features and functionality."

Two of the most popular programs and their features:

apt-probeset-summarize: An application for analyzing expression (i.e., U133 and Exon Arrays). Features include:

apt-probeset-genotype: An application for making genotype calls using Mapping Arrays (100K, 500K, Genome-Wide SNP Arrays 5.0 and 6.0). Features include:

To run APT on any system, use module load apt.

APT Libraries

A set of APT libraries is installed in /fdb/apt/. The files currently available there are downloaded from http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st.

Two sets of sample data are in /usr/local/apps/apt/sample/.

If you need additional data sets, let us know by sending email to staff@hpc.nih.gov.

Running APT on Helix

Sample session:

click to expand/collapse apt-probeset-genotype sample session

click to expand/collapse apt-probeset-summarize sample session 1:

click to expand/collapse apt-probeset-summarize sample session 2:

Running a single APT batch job on Biowulf

Set up a batch script along the following lines:

#!/bin/bash

module load apt
mkdir /data/$USER/apt-genotype
cd /data/$USER/apt-genotype
tar xvzf /usr/local/apps/apt/sample/CD_Mapping250K_Sty_rev4.tar.gz
tar xvzf /usr/local/apps/apt/sample/500K_data.tar.gz
apt-probeset-genotype -o results_dir -c CD_Mapping250K_Sty_rev4/Full/Mapping250K_Sty/LibFiles/Mapping250K_Sty.cdf 
	--chrX-snps CD_Mapping250K_Sty_rev4/Full/Mapping250K_Sty/LibFiles/Mapping250K_Sty.chrx \
      500K_data/chip_data/Sty/Sty*/*.CEL

Submit this job with:

sbatch apt.bat

From previous runs, it is known that this particular job requires very little memory, so the default 4 GB memory is sufficient. If your job requires more than 4 GB of memory, submit with

sbatch --mem=#g apt.bat
where # is the number of GB of memory required.

Running a swarm of APT batch jobs on Biowulf

Set up a swarm command file along the following lines:

apt-probeset-genotype  .... parameters for run 1....
apt-probeset-genotype  ... parameters for run 2 ...
apt-probeset-genotype  ...parameters for run 3...

Submit with

swarm -f swarmfile
If each command in the swarmfile above requires more than 4 GB of memory, submit with:
swarm -f swarmfile -g #
where # is the number of GigaBytes of memory required by each command.

Running an interactive job on Biowulf

Allocate an interactive node with 'sinteractive', then proceed as in the Helix example above. Sample session:

[susanc@biowulf ~]$ sinteractive
salloc.exe: Granted job allocation 14736
slurm stepprolog here!
Begin slurm taskprolog!
End slurm taskprolog!
[susanc@p23 apt]$ cd /data/susanc/apt
[susanc@p23 apt]$ % apt-probeset-genotype -o results_dir -c \ CD_Mapping250K_Sty_rev4/Full/Mapping250K_Sty/LibFiles/Mapping250K_Sty.cdf --chrX-snps \ 
      CD_Mapping250K_Sty_rev4/Full/Mapping250K_Sty/LibFiles/Mapping250K_Sty.chrx \
      500K_data/chip_data/Sty/Sty*/*.CEL

Running ProbesetGenotypeEngine...
[...etc...]
[susanc@p23 apt]$ exit
exit
slurm stepepilog here!
salloc.exe: Relinquishing job allocation 14736
[susanc@biowulf ~]$

Documentation

Affymetrix Power Tools docs