Suggest the best number of subgroups

# S4 method for ConsensusPartition
suggest_best_k(object,
    jaccard_index_cutoff = select_jaccard_cutoff(ncol(object)),
    mean_silhouette_cutoff = NULL,
    stable_PAC = 0.1, help = cola_opt$help)

Arguments

object

A ConsensusPartition-class object.

jaccard_index_cutoff

The cutoff for Jaccard index for comparing to previous k.

mean_silhouette_cutoff

Cutoff for mean silhourtte scores.

stable_PAC

Cutoff for stable PAC. This argument only take effect when mean_silhouette_cutoff is set to NULL.

help

Whether to print help message.

Details

The best k is selected according to following rules:

  • All k with Jaccard index larger than jaccard_index_cutoff are removed because increasing k does not provide enough extra information. If all k are removed, it is marked as no subgroup is detected.

  • If all k with Jaccard index larger than 0.75, k with the highest mean silhourtte score is taken as the best k.

  • For all k with mean silhouette score larger than mean_silhouette_cutoff, the maximal k is taken as the best k, and other k are marked as optional best k.

  • If argument mean_silhouette_cutoff is set to NULL, which means we do not filter by mean silhouette scores while by 1-PAC scores. Similarly, k with the highest 1-PAC is taken the best k and other k are marked as optional best k.

  • If it does not fit the second rule. The k with the maximal vote of the highest 1-PAC score, highest mean silhouette, and highest concordance is taken as the best k.

It should be noted that it is difficult to find the best k deterministically, we encourage users to compare results for all k and determine a proper one which best explain their studies.

See

The selection of the best k can be visualized by select_partition_number.

Value

The best k.

Author

Zuguang Gu <z.gu@dkfz.de>

Examples

data(golub_cola)
obj = golub_cola["ATC", "skmeans"]
suggest_best_k(obj)
#> The best k suggested by this function might not reflect the real
#> subgroups in the data (especially when you expect a large best k). It
#> is recommended to directly look at the plots from
#> select_partition_number() or other related plotting functions.
#> [1] 3
#> attr(,"optional")
#> [1] 2