High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed

ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks), a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems. This method uses an information theoretic approach to eliminate the vast majority of indirect interactions typically inferred by pairwise analysis.


On Helix

Sample session:

module load ARACNE
aracne2 -H $ARACNE_HOME -i input.exp -n 0.01 -o output.adj
Batch job on Biowulf

Create a batch input file (e.g. ARACNE.sh), which uses the input file 'input.exp'. For example:

module load ARACNE
aracne2 -H $ARACNE_HOME -i input.exp -n 0.01 -o output.adj

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=1 --gres=lscratch:10 ARACNE.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. ARACNE.swarm). For example:

aracne2 -H $ARACNE_HOME -i input_1.exp -n 0.01 -o output_1.adj
aracne2 -H $ARACNE_HOME -i input_2.exp -n 0.01 -o output_2.adj
aracne2 -H $ARACNE_HOME -i input_3.exp -n 0.01 -o output_3.adj
aracne2 -H $ARACNE_HOME -i input_4.exp -n 0.01 -o output_4.adj

Submit this job using the swarm command.

swarm -f ARACNE.swarm --module ARACNE
Interactive job on Biowulf

Once an interactive node has been allocated, the procedure is the same as on Helix.

Parallelizing ARACNE

When dealing with more than a few hundred interactions, otherwise known as mutual information estimates (MIs), the ARACNE procedure can be broken into two steps. The first step involves construction of the full matrix by first calculating the MIs on an individual basis, the second involves calculating the data processing inequality (DPI) and applying thresholds. Because the first step involves each interaction independently, the first step is embarrasingly parallel, and can be accelerated with multiprocessing.

Below are two scripts that can be used as templates for parallelizing ARACNE runs. The first is meant for running on a single node, and the second involves using swarm.

Parallelization on a single node

The script below uses sample data provided with ARACNE. It automatically determines the Hub Gene id of each input line from the input data and uses the parallel command to calculate each individual MI. It then aggregates the data and applies the threshold and DPI calculation later. It uses local scratch space to store the intermediate files.

This script, after editing, is then submitted to the batch system. The number of cpus and amount of memory must be assigned.

sbatch --cpus-per-task=16 --mem=5g --gres=lscratch:10 script.sh
Parallelization via swarm

Parallelization by swarm is a bit trickier, in that the intermediate files must be written to a directory that is available to all nodes. Local scratch can't be used. The script below, once edited, can be submitted to the batch system with no special requirements. The batch job will launch a swarm, followed by a secondary batch job which will automatically aggregate the data.

This script can be submitted using this command:

sbatch script.sh