1. ClusterMine

1.1 Description

ClusterMine is a knowledge-integrated clustering approach to cluster samples based on gene expression profile. Two inputs are required to run ClusterMine: (1) a list of gene sets, (2) gene expression data matrix with genes in rows and samples in columns (row names are RefSeq gene symbols). ClusterMine (1) uses piror gene sets that often contain a set of functionally related genes to partition gene expression data into gene-set-wise subdata, (2) perform clustering analysis for each subdata, and (3) combine the clustering results of all subdata into a final clustering as the output.

1.2 Download

ClusterMine is implemented as an R package, which is freely available for non-commercial use.

Version Changes
ClusterMine_0.9.6.tar.gz added a parameter: varianceRatio (its value was not parameterized in previous versions).
ClusterMine_0.9.5.tar.gz added the 'generate_geneset.R' function to generate customer gene sets.
ClusterMine_0.9.0.tar.gz

2. Install

Step 1: Download the above ClusterMine package and install it in R (tested on version 3.3.0)

Step 2: Install the "gplots" R package (tested on version 3.0.1), which is dependent of ClusterMine

3. Usage

Notes: ClusterMine was tested on linux, Mac and Windows; and it ran smoothly on these different systems.

Using CluserMine is very simple. Just follow the steps below:

Step 1: open your R or Rstudio
Step 2: in the R command window, run the following command to load the R package

> library(ClusterMine)

Step 3: in R command window, run the following command to see the help document for running ClusterMine. Then, you should be able to see a help page.

> ?ClusterMine

Step 4: At the end of the help page, there is an example code. Copy these codes to command to run as follows:

>data(Yeodemo)
>result = ClusterMine(data,kcluster=3,genesetClass="c2.all",outputdir = "./tmp_ClusterMine")
>performance = evalcluster(meta$label,result$membership)

'performance' is returned as a vector of three elements: NMI (normalized mutual information), RI (Rand Index) and ARI (adjusted Rand index), which are three commonly used criteria to assess the quality of clustering results. In this example, you would see that NMI=0.975, RI=0.992, ARI= 0.984.

Notes: It would take a few seconds to finish. After it is done, a folder named 'tmp_ClusterMine' would be generated which contains all output files (high-resolution figures and top scored gene sets) inside.

4. Contact

If any questions, please do not hesitate to contact us at:
Hongdong Li, hongdong@csu.edu.cn
Jianxin Wang, jxwang@csu.edu.cn

5. How to cite?

If you use this tool, please cite the following work.

Hongdong Li, Yunpei Xu, Xiaoshu Zhu, Quan Liu, Gilbert S. Omenn, Jianxin Wang, ClusterMine: a Knowledge-integrated Clustering Approach based on Expression Profiles of Gene Sets, 2018, submitted




ClusterMine: a Knowledge-integrated Clustering Approach based on Expression Profiles of Gene Sets.
Developed at Center for Bioinformatics, Central South University, Changsha, P.R. China.