cluster3 is a multipurpose open-source library of C routines, callable from other C and C++programs. It implements k-means clustering, hierarchical clustering and self-organizing maps and provides several unique analytical approaches.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=32g --gres=lscratch:10
[user@cn3200 ~]$module load cluster3/1.59
+] Loading cluster3 1.59
[user@cn3200 ~]$ cluster3 -h
Cluster 3.0, command-line version.
USAGE: cluster [options]
options:
-v, --version Version information
-f filename File loading
-l Specifies to log-transform the data before clustering
(default is no log-transform)
-cg a|m Specifies whether to center each row (gene)
in the data
a: Subtract the mean of each row
m: Subtract the median of each row
(default is no centering)
-ng Specifies to normalize each row (gene) in the data
(default is no normalization)
-ca a|m Specifies whether to center each column (microarray)
in the data
a: Subtract the mean of each column
m: Subtract the median of each column
(default is no centering)
-na Specifies to normalize each column (microarray) in the data
(default is no normalization)
-u jobname Allows you to specify a different name for the output files
(default is derived from the input file name)
-g [0..8] Specifies the distance measure for gene clustering
0: No gene clustering
1: Uncentered correlation
2: Pearson correlation
3: Uncentered correlation, absolute value
4: Pearson correlation, absolute value
5: Spearman's rank correlation
6: Kendall's tau
7: Euclidean distance
8: City-block distance
(default: 0)
-e [0..8] Specifies the distance measure for microarray clustering
0: No clustering
1: Uncentered correlation
2: Pearson correlation
3: Uncentered correlation, absolute value
4: Pearson correlation, absolute value
5: Spearman's rank correlation
6: Kendall's tau
7: Euclidean distance
8: City-block distance
(default: 0)
-m [msca] Specifies which hierarchical clustering method to use
m: Pairwise complete-linkage
s: Pairwise single-linkage
c: Pairwise centroid-linkage
a: Pairwise average-linkage
(default: m)
-k number Specifies whether to run k-means clustering
instead of hierarchical clustering, and the number
of clusters k to use
-r number For k-means clustering, the number of times the
k-means clustering algorithm is run
(default: 1)
-pg Specifies to apply Principal Component Analysis to
genes instead of clustering
-pa Specifies to apply Principal Component Analysis to
arrays instead of clustering
-s Specifies to calculate an SOM instead of hierarchical
clustering
-x number Specifies the horizontal dimension of the SOM grid
(default: 2)
-y number Specifies the vertical dimension of the SOM grid
(default: 1)
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$