cluster3: the C clustering library
cluster3 is a multipurpose open-source library of C routines, callable from other C and C++programs. It implements k-means clustering, hierarchical clustering and self-organizing maps and provides several unique analytical approaches.
Documentation
Important Notes
- Module Name: cluster3 (see the modules page for more information)
- Unusual environment variables set
- CLUSTER3_HOME installation directory
- CLUSTER3_BIN executable directory
- CLUSTER3_SRC source code directory
- CLUSTER3_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=32g --gres=lscratch:10 [user@cn3200 ~]$module load cluster3/1.59 +] Loading cluster3 1.59 [user@cn3200 ~]$ cluster3 -h Cluster 3.0, command-line version. USAGE: cluster [options] options: -v, --version Version information -f filename File loading -l Specifies to log-transform the data before clustering (default is no log-transform) -cg a|m Specifies whether to center each row (gene) in the data a: Subtract the mean of each row m: Subtract the median of each row (default is no centering) -ng Specifies to normalize each row (gene) in the data (default is no normalization) -ca a|m Specifies whether to center each column (microarray) in the data a: Subtract the mean of each column m: Subtract the median of each column (default is no centering) -na Specifies to normalize each column (microarray) in the data (default is no normalization) -u jobname Allows you to specify a different name for the output files (default is derived from the input file name) -g [0..8] Specifies the distance measure for gene clustering 0: No gene clustering 1: Uncentered correlation 2: Pearson correlation 3: Uncentered correlation, absolute value 4: Pearson correlation, absolute value 5: Spearman's rank correlation 6: Kendall's tau 7: Euclidean distance 8: City-block distance (default: 0) -e [0..8] Specifies the distance measure for microarray clustering 0: No clustering 1: Uncentered correlation 2: Pearson correlation 3: Uncentered correlation, absolute value 4: Pearson correlation, absolute value 5: Spearman's rank correlation 6: Kendall's tau 7: Euclidean distance 8: City-block distance (default: 0) -m [msca] Specifies which hierarchical clustering method to use m: Pairwise complete-linkage s: Pairwise single-linkage c: Pairwise centroid-linkage a: Pairwise average-linkage (default: m) -k number Specifies whether to run k-means clustering instead of hierarchical clustering, and the number of clusters k to use -r number For k-means clustering, the number of times the k-means clustering algorithm is run (default: 1) -pg Specifies to apply Principal Component Analysis to genes instead of clustering -pa Specifies to apply Principal Component Analysis to arrays instead of clustering -s Specifies to calculate an SOM instead of hierarchical clustering -x number Specifies the horizontal dimension of the SOM grid (default: 2) -y number Specifies the vertical dimension of the SOM grid (default: 1) [user@cn3200 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$