Consensus partition

consensus_partition(data,
    top_value_method = "ATC",
    top_n = NULL,
    partition_method = "skmeans",
    max_k = 6,
    k = NULL,
    sample_by = "row",
    p_sampling = 0.8,
    partition_repeat = 50,
    partition_param = list(),
    anno = NULL,
    anno_col = NULL,
    scale_rows = NULL,
    verbose = TRUE,
    mc.cores = 1, cores = mc.cores,
    prefix = "",
    .env = NULL,
    help = cola_opt$help)

Arguments

data

A numeric matrix where subgroups are found by columns.

top_value_method

A single top-value method. Available methods are in all_top_value_methods. Use register_top_value_methods to add a new top-value method.

top_n

Number of rows with top values. The value can be a vector with length > 1. When n > 5000, the function only randomly sample 5000 rows from top n rows. If top_n is a vector, paritition will be applied to every values in top_n and consensus partition is summarized from all partitions.

partition_method

A single partitioning method. Available methods are in all_partition_methods. Use register_partition_methods to add a new partition method.

max_k

Maximal number of subgroups to try. The function will try for 2:max_k subgroups

k

Alternatively, you can specify a vector k.

sample_by

Should randomly sample the matrix by rows or by columns?

p_sampling

Proportion of the submatrix which contains the top n rows to sample.

partition_repeat

Number of repeats for the random sampling.

partition_param

Parameters for the partition method which are passed to ... in a registered partitioning method. See register_partition_methods for detail.

anno

A data frame with known annotation of samples. The annotations will be plotted in heatmaps and the correlation to predicted subgroups will be tested.

anno_col

A list of colors (color is defined as a named vector) for the annotations. If anno is a data frame, anno_col should be a named list where names correspond to the column names in anno.

scale_rows

Whether to scale rows. If it is TRUE, scaling method defined in register_partition_methods is used.

verbose

Whether print messages.

mc.cores

Multiple cores to use. This argument will be removed in future versions.

cores

Number of cores, or a cluster object returned by makeCluster.

prefix

Internally used.

.env

An environment, internally used.

help

Whether to print help messages.

Details

The function performs analysis in following steps:

  • calculate scores for rows by top-value method,

  • for each top_n value, take top n rows,

  • randomly sample p_sampling rows from the top_n-row matrix and perform partitioning for partition_repeats times,

  • collect partitions from all individual partitions and summarize a consensus partition.

Value

A ConsensusPartition-class object. Simply type object in the interactive R session to see which functions can be applied on it.

See also

run_all_consensus_partition_methods runs consensus partitioning with multiple top-value methods and multiple partitioning methods.

Author

Zuguang Gu <z.gu@dkfz.de>

Examples

set.seed(123)
m = cbind(rbind(matrix(rnorm(20*20, mean = 1,   sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 0,   sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 0,   sd = 0.5), nr = 20)),
          rbind(matrix(rnorm(20*20, mean = 0,   sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 1,   sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 0,   sd = 0.5), nr = 20)),
          rbind(matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 0.5, sd = 0.5), nr = 20),
                matrix(rnorm(20*20, mean = 1,   sd = 0.5), nr = 20))
         ) + matrix(rnorm(60*60, sd = 0.5), nr = 60)
res = consensus_partition(m, partition_repeat = 10, top_n = c(10, 20, 50))
#> * run ATC:skmeans on a 60x60 matrix.
#> * calculating ATC values.
#> * rows are scaled before sent to partition, method: 'z-score' (x - mean)/sd
#> * get top 10 rows by ATC method
#> Loading required package: foreach
#> Loading required package: rngtools
#> * get top 20 rows by ATC method
#> * get top 50 rows by ATC method
#> * wrap results for k = 2
#> * wrap results for k = 3
#> * wrap results for k = 4
#> * wrap results for k = 5
#> * wrap results for k = 6
#> * adjust class labels between different k.
#> * ATC:skmeans used 12.832 secs.
res
#> A 'ConsensusPartition' object with k = 2, 3, 4, 5, 6.
#>   On a matrix with 60 rows and 60 columns.
#>   Top rows (10, 20, 50) are extracted by 'ATC' method.
#>   Subgroups are detected by 'skmeans' method.
#>   Performed in total 150 partitions by row resampling.
#>   Best k for subgroups seems to be 2.
#> 
#> Following methods can be applied to this 'ConsensusPartition' object:
#>  [1] "cola_report"             "collect_classes"        
#>  [3] "collect_plots"           "collect_stats"          
#>  [5] "colnames"                "compare_partitions"     
#>  [7] "compare_signatures"      "consensus_heatmap"      
#>  [9] "dimension_reduction"     "functional_enrichment"  
#> [11] "get_anno"                "get_anno_col"           
#> [13] "get_classes"             "get_consensus"          
#> [15] "get_matrix"              "get_membership"         
#> [17] "get_param"               "get_signatures"         
#> [19] "get_stats"               "is_best_k"              
#> [21] "is_stable_k"             "membership_heatmap"     
#> [23] "ncol"                    "nrow"                   
#> [25] "plot_ecdf"               "predict_classes"        
#> [27] "rownames"                "select_partition_number"
#> [29] "show"                    "suggest_best_k"         
#> [31] "test_to_known_factors"   "top_rows_heatmap"