ClusterMine is a knowledge-integrated clustering approach to cluster samples based on gene expression profile. Two inputs are required to run ClusterMine: (1) a list of gene sets, (2) gene expression data matrix with genes in rows and samples in columns (row names are RefSeq gene symbols). ClusterMine (1) uses piror gene sets that often contain a set of functionally related genes to partition gene expression data into gene-set-wise subdata, (2) perform clustering analysis for each subdata, and (3) combine the clustering results of all subdata into a final clustering as the output.
ClusterMine is implemented as an R package, which is freely available for non-commercial use.
Version | Changes |
---|---|
ClusterMine_0.9.6.tar.gz | added a parameter: varianceRatio (its value was not parameterized in previous versions). |
ClusterMine_0.9.5.tar.gz | added the 'generate_geneset.R' function to generate customer gene sets. |
ClusterMine_0.9.0.tar.gz |
Step 1: Download the above ClusterMine package and install it in R (tested on version 3.3.0)
Step 2: Install the "gplots" R package (tested on version 3.0.1), which is dependent of ClusterMine
Notes: ClusterMine was tested on linux, Mac and Windows; and it ran smoothly on these different systems.
Using CluserMine is very simple. Just follow the steps below:
Step 1: open your R or Rstudio
Step 2: in the R command window, run the following command to load the R package
> library(ClusterMine)
Step 3: in R command window, run the following command to see the help document for running ClusterMine. Then, you should be able to see a help page.
> ?ClusterMine
Step 4: At the end of the help page, there is an example code. Copy these codes to command to run as follows:
>data(Yeodemo)
>result = ClusterMine(data,kcluster=3,genesetClass="c2.all",outputdir = "./tmp_ClusterMine")
>performance = evalcluster(meta$label,result$membership)
'performance' is returned as a vector of three elements: NMI (normalized mutual information), RI (Rand Index) and ARI (adjusted Rand index), which are three commonly used criteria to assess the quality of clustering results. In this example, you would see that NMI=0.975, RI=0.992, ARI= 0.984.
Notes: It would take a few seconds to finish. After it is done, a folder named 'tmp_ClusterMine' would be generated which contains all output files (high-resolution figures and top scored gene sets) inside.
If any questions, please do not hesitate to contact us at:
Hongdong Li, hongdong@csu.edu.cn
Jianxin Wang, jxwang@csu.edu.cn