Consensus partitioning only with a subset of columns

consensus_partition_by_down_sampling(data,
    top_value_method = "ATC",
    top_n = NULL,
    partition_method = "skmeans",
    max_k = 6, k = NULL,
    subset = min(round(ncol(data)*0.2), 250), pre_select = TRUE,
    verbose = TRUE, prefix = "", anno = NULL, anno_col = NULL,
    predict_method = "centroid",
    dist_method = c("euclidean", "correlation", "cosine"),
    .env = NULL, .predict = TRUE, mc.cores = 1, cores = mc.cores, ...)

Arguments

data: A numeric matrix where subgroups are found by columns.
top_value_method: A single top-value method. Available methods are in all_top_value_methods. Use register_top_value_methods to add a new top-value method.
top_n: Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If top_n is a vector, paritition will be applied to every values in top_n and consensus partition is summarized from all partitions.
partition_method: A single partitioning method. Available methods are in all_partition_methods. Use register_partition_methods to add a new partition method.
max_k: Maximal number of subgroups to try. The function will try for 2:max_k subgroups
k: Alternatively, you can specify a vector k.
subset: Number of columns to randomly sample, or a vector of selected indices.
pre_select: Whether to pre-select by k-means.
verbose: Whether to print messages.
prefix: Internally used.
anno: Annotation data frame.
anno_col: Annotation colors.
predict_method: Method for predicting class labels. Possible values are "centroid", "svm" and "randomForest".
dist_method: Method for predict the class for other columns.
.env: An environment, internally used.
.predict: Internally used.
mc.cores: Number of cores. This argument will be removed in future versions.
cores: Number of cores, or a cluster object returned by makeCluster.
...: All pass to consensus_partition.

Details

The function performs consensus partitioning only with a small subset of columns and the class of other columns are predicted by predict_classes,ConsensusPartition-method.

Examples

# \dontrun{
data(golub_cola)
m = get_matrix(golub_cola)

set.seed(123)
golub_cola_ds = consensus_partition_by_down_sampling(m, subset = 50,
  anno = get_anno(golub_cola), anno_col = get_anno_col(golub_cola),
  top_value_method = "SD", partition_method = "kmeans")
#> * apply consensus_partition_by_down_sampling() with 50 columns.
#> * run SD:kmeans on a 4116x50 matrix.
#> * calculating SD values.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 368 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:kmeans used 1.28 secs.
#> * predict class for 72 samples with k = 2
#>   * take top 500/689 most significant signatures for prediction.
#> * predict class for 72 samples with k = 3
#>   * take top 500/787 most significant signatures for prediction.
#> * predict class for 72 samples with k = 4
#>   * take top 500/1018 most significant signatures for prediction.
#> * predict class for 72 samples with k = 5
#>   * take top 500/1005 most significant signatures for prediction.
#> * predict class for 72 samples with k = 6
#>   * take top 500/887 most significant signatures for prediction.
# }