Author: Zuguang Gu ( firstname.lastname@example.org )
Package version: 1.8.0
Assume your matrix is stored in an object called
mat, to perform consensus
partitioning with cola, you only need to run following code:
# code only for demonstration mat = adjust_matrix(mat) # optional rl = run_all_consensus_partition_methods(mat, cores = ...) cola_report(rl, output_dir = ..., cores = ...)
In above code, there are three steps:
NAs are removed. Rows with very low variance are removed.
NAvalues are imputed if there are less than 50% in each row. Outliers are adjusted in each row.
hclust(hierarchical clustering with cutree),
skmeans::skmeans(spherical k-means clustering),
cluster::pam(partitioning around medoids) and
Mclust::mclust(model-based clustering). The default methods to extract top n rows are
CV(coefficient of variation),
MAD(median absolute deviation) and
ATC(ability to correlate to other rows).
run_all_consensus_partition_methods() runs multiple methods in sequence, which might
take long time for big datasets. Users can also run consensus partitioining with
a specific top-value methods (e.g. SD) and partitioning methods (e.g. skmeans) by
res = consensus_partition(mat, top_value_method = ..., partition_method = ...) cola_report(res, output_dir = ..., cores = ...)
You can refer to the main vignette for more details.
For extremely large datasets, users can run
consensus_partition_by_down_sampling() by randomly
sampling a subset of samples for classification, later the classes of the remaining
samples are predicted by the signatures of the cola classification. More details
can be found in the vignette “Work with Big Datasets”.
res = consensus_partition_by_down_sampling(mat, subset = ..., top_value_method = ..., partition_method = ...) cola_report(res, output_dir = ..., cores = ...)
There are examples on real datasets for cola analysis that can be found at https://jokergoo.github.io/cola_collection/.
From version 2.0.0, there is a new function
hierarchical_partition() that applies consensus partitioning in
a hierarchical way. Simply use
hierarchical_partition() with the matrix:
rh = hierarchical_partition(mat, cores = ...) cola_report(rh, output_dir = ..., cores = ...)
With big matrix, argument
subset can be set so that down sampling consensus partitioning will be internally used. E.g.
rh = hierarchical_partition(mat, subset = 500, cores = ...)
Please refer to the vignette “Hierarchical Consensus Partitioning” for more details on this method.