Biowulf High Performance Computing at the NIH
cluster3: the C clustering library

cluster3 is a multipurpose open-source library of C routines, callable from other C and C++programs. It implements k-means clustering, hierarchical clustering and self-organizing maps and provides several unique analytical approaches.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=32g --gres=lscratch:10
[user@cn3200 ~]$module load cluster3/1.59 
+] Loading cluster3  1.59   
[user@cn3200 ~]$ cluster3 -h
Cluster 3.0, command-line version.
USAGE: cluster [options]
options:
  -v, --version Version information
  -f filename   File loading
  -l            Specifies to log-transform the data before clustering
                (default is no log-transform)
  -cg a|m       Specifies whether to center each row (gene)
                in the data
                a: Subtract the mean of each row
                m: Subtract the median of each row
                (default is no centering)
  -ng           Specifies to normalize each row (gene) in the data
                (default is no normalization)
  -ca a|m       Specifies whether to center each column (microarray)
                in the data
                a: Subtract the mean of each column
                m: Subtract the median of each column
                (default is no centering)
  -na           Specifies to normalize each column (microarray) in the data
                (default is no normalization)
  -u jobname    Allows you to specify a different name for the output files
                (default is derived from the input file name)
  -g [0..8]     Specifies the distance measure for gene clustering
                0: No gene clustering
                1: Uncentered correlation
                2: Pearson correlation
                3: Uncentered correlation, absolute value
                4: Pearson correlation, absolute value
                5: Spearman's rank correlation
                6: Kendall's tau
                7: Euclidean distance
                8: City-block distance
                (default: 0)
  -e [0..8]     Specifies the distance measure for microarray clustering
                0: No clustering
                1: Uncentered correlation
                2: Pearson correlation
                3: Uncentered correlation, absolute value
                4: Pearson correlation, absolute value
                5: Spearman's rank correlation
                6: Kendall's tau
                7: Euclidean distance
                8: City-block distance
                (default: 0)
  -m [msca]     Specifies which hierarchical clustering method to use
                m: Pairwise complete-linkage
                s: Pairwise single-linkage
                c: Pairwise centroid-linkage
                a: Pairwise average-linkage
                (default: m)
  -k number     Specifies whether to run k-means clustering
                instead of hierarchical clustering, and the number
                of clusters k to use
  -r number     For k-means clustering, the number of times the
                k-means clustering algorithm is run
                (default: 1)
  -pg           Specifies to apply Principal Component Analysis to
                genes instead of clustering
  -pa           Specifies to apply Principal Component Analysis to
                arrays instead of clustering
  -s            Specifies to calculate an SOM instead of hierarchical
                clustering
  -x number     Specifies the horizontal dimension of the SOM grid
                (default: 2)
  -y number     Specifies the vertical dimension of the SOM grid
                (default: 1)
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$