Skip to contents

Cluster terms based on their similarity matrix

Usage

cluster_terms(
  mat,
  method = "binary_cut",
  control = list(),
  verbose = se_opt$verbose
)

cluster_by_kmeans(mat, max_k = max(2, min(round(nrow(mat)/5), 100)), ...)

cluster_by_pam(mat, max_k = max(2, min(round(nrow(mat)/10), 100)), ...)

cluster_by_dynamicTreeCut(mat, minClusterSize = 5, ...)

cluster_by_fast_greedy(mat, ...)

cluster_by_leading_eigen(mat, ...)

cluster_by_louvain(mat, ...)

cluster_by_walktrap(mat, ...)

cluster_by_mclust(mat, G = seq_len(max(2, min(round(nrow(mat)/5), 100))), ...)

cluster_by_apcluster(mat, s = apcluster::negDistMat(r = 2), ...)

cluster_by_hdbscan(mat, minPts = 5, ...)

cluster_by_MCL(mat, addLoops = TRUE, ...)

Arguments

mat

A similarity matrix.

method

The clustering methods. Value should be in all_clustering_methods().

control

A list of parameters passed to the corresponding clustering function.

verbose

Whether to print messages.

max_k

Maximal k for k-means/PAM clustering. K-means/PAM clustering is applied from k = 2 to k = max_k.

...

Other arguments.

minClusterSize

Minimal number of objects in a cluster. Pass to dynamicTreeCut::cutreeDynamic().

G

Passed to the G argument in mclust::Mclust() which is the number of clusters.

s

Passed to the s argument in apcluster::apcluster().

minPts

Passed to the minPts argument in dbscan::hdbscan().

addLoops

Passed to the addLoops argument in MCL::mcl().

Value

A vector of numeric cluster labels.

Details

New clustering methods can be registered by register_clustering_methods().

Please note it is better to directly use cluster_terms() for clustering while not the individual cluster_by_* functions because cluster_terms() does additional cluster label adjustment.

By default, there are the following clustering methods and corresponding clustering functions:

  • kmeans see cluster_by_kmeans().

  • dynamicTreeCut see cluster_by_dynamicTreeCut().

  • mclust see cluster_by_mclust().

  • apcluster see cluster_by_apcluster().

  • hdbscan see cluster_by_hdbscan().

  • fast_greedy see cluster_by_fast_greedy().

  • louvain see cluster_by_louvain().

  • walktrap see cluster_by_walktrap().

  • MCL see cluster_by_MCL().

  • binary_cut see binary_cut().

The additional argument in individual clustering functions can be set with the control argument in cluster_terms().

cluster_by_kmeans(): The best k for k-means clustering is determined according to the "elbow" or "knee" method on the distribution of within-cluster sum of squares (WSS) on each k. All other arguments are passed from ... to stats::kmeans().

cluster_by_pam(): PAM is applied by fpc::pamk() which can automatically select the best k. All other arguments are passed from ... to fpc::pamk().

cluster_by_dynamicTreeCut(): All other arguments are passed from ... to dynamicTreeCut::cutreeDynamic().

cluster_by_fast_greedy(): All other arguments are passed from ... to igraph::cluster_fast_greedy().

cluster_by_leading_eigen(): All other arguments are passed from ... to igraph::cluster_leading_eigen().

cluster_by_louvain(): All other arguments are passed from ... to igraph::cluster_louvain().

cluster_by_walktrap(): All other arguments are passed from ... to igraph::cluster_walktrap().

cluster_by_mclust(): All other arguments are passed from ... to mclust::Mclust().

cluster_by_apcluster(): All other arguments are passed from ... to apcluster::apcluster().

cluster_by_hdbscan(): All other arguments are passed from ... to dbscan::hdbscan().

cluster_by_MCL(): All other arguments are passed from ... to MCL::mcl().