High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed

GiniClust is a clustering method implemented in Python and R for detecting rare cell-types from large-scale single-cell gene expression data.

GiniClust can be applied to datasets originating from different platforms, such as multiplex qPCR data, traditional single-cell RNAseq or newly emerging UMI-based single-cell RNAseq, e.g. inDrops and Drop-seq.

GiniClust is created and maintained by the GC Yuan Lab at Harvard University and the Dana-Farber Cancer Institute and comes with a graphical user interface for convenience.


There are multiple versions of GiniClust available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail GiniClust

To select a module, type

module load GiniClust/[ver]

where [ver] is the version of choice.

Environment variables set:

Interactive job

GiniClust can be run directly from the command line:

[node]$ module load GiniClust
[node]$ ln -s $GINICLUSTHOME/sample_data/Data_GBM.csv .
[node]$ Rscript $GINICLUSTHOME/GiniClust_Main.R -f Data_GBM.csv -t RNA-seq -o GBM_results

GiniClust can also be run via an X11 GUI (see https://hpc.nih.gov/docs/connect.html for details about creating a graphical session on HPC systems):

[node]$ GiniClust.py
Batch job

Create a batch input file (e.g. GiniClust.sh), which uses the input file 'GiniClust.in'. For example:

module load GiniClust
Rscript $GINICLUSTHOME/GiniClust_Main.R -f my_input_data.csv -t RNA-seq -o GBM_results

Submit this job using the Slurm sbatch command.

[biowulf]$ sbatch --cpus-per-task=1 GiniClust.sh