Consensus partitioning for all combinations of methods

run_all_consensus_partition_methods(data,
    top_value_method = all_top_value_methods(),
    partition_method = all_partition_methods(),
    max_k = 6, k = NULL,
    top_n = NULL,
    mc.cores = 1, cores = mc.cores, anno = NULL, anno_col = NULL,
    sample_by = "row", p_sampling = 0.8, partition_repeat = 50,
    scale_rows = NULL, verbose = TRUE, help = cola_opt$help)

Arguments

data

A numeric matrix where subgroups are found by columns.

top_value_method

Method which are used to extract top n rows. Allowed methods are in all_top_value_methods and can be self-added by register_top_value_methods.

partition_method

Method which are used to partition samples. Allowed methods are in all_partition_methods and can be self-added by register_partition_methods.

max_k

Maximal number of subgroups to try. The function will try 2:max_k subgroups.

k

Alternatively, you can specify a vector k.

top_n

Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If top_n is a vector, paritition will be applied to every values in top_n and consensus partition is summarized from all partitions.

mc.cores

Number of cores to use. This argument will be removed in future versions.

cores

Number of cores, or a cluster object returned by makeCluster.

anno

A data frame with known annotation of columns.

anno_col

A list of colors (color is defined as a named vector) for the annotations. If anno is a data frame, anno_col should be a named list where names correspond to the column names in anno.

sample_by

Should randomly sample the matrix by rows or by columns?

p_sampling

Proportion of the top n rows to sample.

partition_repeat

Number of repeats for the random sampling.

scale_rows

Whether to scale rows. If it is TRUE, scaling method defined in register_partition_methods is used.

verbose

Whether to print messages.

help

Whether to print help messages.

Details

The function performs consensus partitioning by consensus_partition for all combinations of top-value methods and partitioning methods.

It also adjsuts the subgroup labels for all methods and for all k to make them as consistent as possible.

Value

A ConsensusPartitionList-class object. Simply type object in the interactive R session to see which functions can be applied on it.

Author

Zuguang Gu <z.gu@dkfz.de>

Examples

# \dontrun{
set.seed(123)
m = cbind(rbind(matrix(rnorm(20*20, mean = 1), nr = 20),
                matrix(rnorm(20*20, mean = -1), nr = 20)),
          rbind(matrix(rnorm(20*20, mean = -1), nr = 20),
                matrix(rnorm(20*20, mean = 1), nr = 20))
         ) + matrix(rnorm(40*40), nr = 40)
rl = run_all_consensus_partition_methods(data = m, top_n = c(20, 30, 40))
#> * on a 40x40 matrix.
#> * calculate top-values.
#>   - calculate SD score for 40 rows.
#>   - calculate CV score for 40 rows.
#>   - calculate MAD score for 40 rows.
#>   - calculate ATC score for 40 rows.
#> ------------------------------------------------------------
#> * running partition by SD:skmeans. 1/20
#> * run SD:skmeans on a 40x40 matrix.
#> * SD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by SD method
#> * get top 30 rows by SD method
#> * get top 40 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:skmeans used 1.007467 mins.
#> ------------------------------------------------------------
#> * running partition by CV:skmeans. 2/20
#> * run CV:skmeans on a 40x40 matrix.
#> * CV values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by CV method
#> * get top 30 rows by CV method
#> * get top 40 rows by CV method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * CV:skmeans used 55.825 secs.
#> ------------------------------------------------------------
#> * running partition by MAD:skmeans. 3/20
#> * run MAD:skmeans on a 40x40 matrix.
#> * MAD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by MAD method
#> * get top 30 rows by MAD method
#> * get top 40 rows by MAD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * MAD:skmeans used 57.137 secs.
#> ------------------------------------------------------------
#> * running partition by ATC:skmeans. 4/20
#> * run ATC:skmeans on a 40x40 matrix.
#> * ATC values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by ATC method
#> * get top 30 rows by ATC method
#> * get top 40 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:skmeans used 56.117 secs.
#> ------------------------------------------------------------
#> * running partition by SD:mclust. 5/20
#> * run SD:mclust on a 40x40 matrix.
#> * SD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by SD method
#> * get top 30 rows by SD method
#> * get top 40 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:mclust used 10.391 secs.
#> ------------------------------------------------------------
#> * running partition by CV:mclust. 6/20
#> * run CV:mclust on a 40x40 matrix.
#> * CV values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by CV method
#> * get top 30 rows by CV method
#> * get top 40 rows by CV method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * CV:mclust used 10.819 secs.
#> ------------------------------------------------------------
#> * running partition by MAD:mclust. 7/20
#> * run MAD:mclust on a 40x40 matrix.
#> * MAD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by MAD method
#> * get top 30 rows by MAD method
#> * get top 40 rows by MAD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * MAD:mclust used 10.229 secs.
#> ------------------------------------------------------------
#> * running partition by ATC:mclust. 8/20
#> * run ATC:mclust on a 40x40 matrix.
#> * ATC values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by ATC method
#> * get top 30 rows by ATC method
#> * get top 40 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:mclust used 10.183 secs.
#> ------------------------------------------------------------
#> * running partition by SD:pam. 9/20
#> * run SD:pam on a 40x40 matrix.
#> * SD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by SD method
#> * get top 30 rows by SD method
#> * get top 40 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:pam used 2.237 secs.
#> ------------------------------------------------------------
#> * running partition by CV:pam. 10/20
#> * run CV:pam on a 40x40 matrix.
#> * CV values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by CV method
#> * get top 30 rows by CV method
#> * get top 40 rows by CV method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * CV:pam used 2.291 secs.
#> ------------------------------------------------------------
#> * running partition by MAD:pam. 11/20
#> * run MAD:pam on a 40x40 matrix.
#> * MAD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by MAD method
#> * get top 30 rows by MAD method
#> * get top 40 rows by MAD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * MAD:pam used 2.295 secs.
#> ------------------------------------------------------------
#> * running partition by ATC:pam. 12/20
#> * run ATC:pam on a 40x40 matrix.
#> * ATC values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by ATC method
#> * get top 30 rows by ATC method
#> * get top 40 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:pam used 2.267 secs.
#> ------------------------------------------------------------
#> * running partition by SD:kmeans. 13/20
#> * run SD:kmeans on a 40x40 matrix.
#> * SD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by SD method
#> * get top 30 rows by SD method
#> * get top 40 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:kmeans used 2.461 secs.
#> ------------------------------------------------------------
#> * running partition by CV:kmeans. 14/20
#> * run CV:kmeans on a 40x40 matrix.
#> * CV values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by CV method
#> * get top 30 rows by CV method
#> * get top 40 rows by CV method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * CV:kmeans used 2.507 secs.
#> ------------------------------------------------------------
#> * running partition by MAD:kmeans. 15/20
#> * run MAD:kmeans on a 40x40 matrix.
#> * MAD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by MAD method
#> * get top 30 rows by MAD method
#> * get top 40 rows by MAD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * MAD:kmeans used 2.501 secs.
#> ------------------------------------------------------------
#> * running partition by ATC:kmeans. 16/20
#> * run ATC:kmeans on a 40x40 matrix.
#> * ATC values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by ATC method
#> * get top 30 rows by ATC method
#> * get top 40 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:kmeans used 2.504 secs.
#> ------------------------------------------------------------
#> * running partition by SD:hclust. 17/20
#> * run SD:hclust on a 40x40 matrix.
#> * SD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by SD method
#> * get top 30 rows by SD method
#> * get top 40 rows by SD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * SD:hclust used 2.143 secs.
#> ------------------------------------------------------------
#> * running partition by CV:hclust. 18/20
#> * run CV:hclust on a 40x40 matrix.
#> * CV values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by CV method
#> * get top 30 rows by CV method
#> * get top 40 rows by CV method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * CV:hclust used 2.128 secs.
#> ------------------------------------------------------------
#> * running partition by MAD:hclust. 19/20
#> * run MAD:hclust on a 40x40 matrix.
#> * MAD values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by MAD method
#> * get top 30 rows by MAD method
#> * get top 40 rows by MAD method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * MAD:hclust used 2.119 secs.
#> ------------------------------------------------------------
#> * running partition by ATC:hclust. 20/20
#> * run ATC:hclust on a 40x40 matrix.
#> * ATC values have already been calculated. Get from cache.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 20 rows by ATC method
#> * get top 30 rows by ATC method
#> * get top 40 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:hclust used 2.12 secs.
#> ------------------------------------------------------------
#> * adjust class labels according to the consensus classifications from all methods.
#>   - get reference class labels from all methods, all k.
#>   - adjust class labels for each single method, each single k.
#> ------------------------------------------------------------
# }