Get signature rows

# S4 method for ConsensusPartition
get_signatures(object, k,
    col = if(scale_rows) c("green", "white", "red") else c("blue", "white", "red"),
    silhouette_cutoff = 0.5,
    fdr_cutoff = cola_opt$fdr_cutoff,
    top_signatures = NULL,
    group_diff = cola_opt$group_diff,
    scale_rows = object@scale_rows, .scale_mean = NULL, .scale_sd = NULL,
    row_km = NULL,
    diff_method = c("Ftest", "ttest", "samr", "pamr", "one_vs_others", "uniquely_high_in_one_group"),
    anno = get_anno(object),
    anno_col = get_anno_col(object),
    internal = FALSE,
    show_row_dend = FALSE,
    show_column_names = FALSE,
    column_names_gp = gpar(fontsize = 8),
    use_raster = TRUE,
    plot = TRUE, verbose = TRUE, seed = 888,
    left_annotation = NULL, right_annotation = NULL,
    simplify = FALSE, prefix = "", enforce = FALSE, hash = NULL, from_hc = FALSE,
    ...)

Arguments

object

A ConsensusPartition-class object.

k

Number of subgroups.

col

Colors for the main heatmap.

silhouette_cutoff

Cutoff for silhouette scores. Samples with values less than it are not used for finding signature rows. For selecting a proper silhouette cutoff, please refer to https://www.stat.berkeley.edu/~s133/Cluster2a.html#tth_tAb1.

fdr_cutoff

Cutoff for FDR of the difference test between subgroups.

top_signatures

Top signatures with most significant fdr. Note since fdr might be same for multiple rows, the final number of signatures might not be exactly the same as the one that has been set.

group_diff

Cutoff for the maximal difference between group means.

scale_rows

Whether apply row scaling when making the heatmap.

.scale_mean

Internally used.

.scale_sd

Internally used.

row_km

Number of groups for performing k-means clustering on rows. By default it is automatically selected.

diff_method

Methods to get rows which are significantly different between subgroups, see 'Details' section.

anno

A data frame of annotations for the original matrix columns. By default it uses the annotations specified in consensus_partition or run_all_consensus_partition_methods.

anno_col

A list of colors (color is defined as a named vector) for the annotations. If anno is a data frame, anno_col should be a named list where names correspond to the column names in anno.

internal

Used internally.

show_row_dend

Whether show row dendrogram.

show_column_names

Whether show column names in the heatmap.

column_names_gp

Graphics parameters for column names.

use_raster

Internally used.

plot

Whether to make the plot.

verbose

Whether to print messages.

seed

Random seed.

left_annotation

Annotation put on the left of the heatmap. It should be a HeatmapAnnotation-class object. The number of items should be the same as the number of the original matrix rows. The subsetting to the significant rows are automatically performed on the annotation object.

right_annotation

Annotation put on the right of the heatmap. Same format as left_annotation.

simplify

Only used internally.

prefix

Only used internally.

enforce

The analysis is cached by default, so that the analysis with the same input will be automatically extracted without rerunning them. Set enforce to TRUE to enforce the funtion to re-perform the analysis.

hash

Userd internally.

from_hc

Is the ConsensusPartition-class object a node of a HierarchicalPartition object?

...

Other arguments.

Details

Basically the function applies statistical test for the difference in subgroups for every row. There are following methods which test significance of the difference:

ttest

First it looks for the subgroup with highest mean value, compare to each of the other subgroups with t-test and take the maximum p-value. Second it looks for the subgroup with lowest mean value, compare to each of the other subgroups again with t-test and take the maximum p-values. Later for these two list of p-values take the minimal p-value as the final p-value.

samr/pamr

use SAM (from samr package)/PAM (from pamr package) method to find significantly different rows between subgroups.

Ftest

use F-test to find significantly different rows between subgroups.

one_vs_others

For each subgroup i in each row, it uses t-test to compare samples in current subgroup to all other samples, denoted as p_i. The p-value for current row is selected as min(p_i).

uniquely_high_in_one_group

The signatures are defined as, if they are uniquely up-regulated in subgroup A, then it must fit following criterions: 1. in a two-group t-test of A ~ other_merged_groups, the statistic must be > 0 (high in group A) and p-value must be significant, and 2. for other groups (excluding A), t-test in every pair of groups should not be significant.

diff_method can also be a self-defined function. The function needs two arguments which are the matrix for the analysis and the predicted classes. The function should returns a vector of FDR from the difference test.

Value

A data frame with more than two columns:

which_row:

row index corresponding to the original matrix.

fdr:

the FDR.

km:

the k-means groups if row_km is set.

other_columns:

the mean value (depending rows are scaled or not) in each subgroup.

Author

Zuguang Gu <z.gu@dkfz.de>

Examples

data(golub_cola)
res = golub_cola["ATC", "skmeans"]
tb = get_signatures(res, k = 3)
#> * 72/72 samples (in 3 classes) remain after filtering by silhouette (>= 0.5).
#> * cache hash: 87c63a8cabc898f97a024514962787f7 (seed 888).
#> * calculating row difference between subgroups by Ftest.
#> * split rows into 4 groups by k-means clustering.
#> * 2058 signatures (50.0%) under fdr < 0.05, group_diff > 0.
#>   - randomly sample 2000 signatures.
#> * making heatmaps for signatures.

head(tb)
#>   which_row          fdr     p_value   mean_1   mean_2   mean_3 group_diff
#> 1         2 0.0005480204 0.000102654 1.747716 2.081334 2.004512  0.3336177
#> 2         3 0.0114427741 0.004095045 2.182303 2.251489 2.397769  0.2154661
#> 3        11 0.0035593360 0.001009916 2.342387 2.529442 2.380858  0.1870552
#> 4        12 0.0143153879 0.005335230 2.384353 2.549453 2.301059  0.2483942
#> 5        13 0.0201369424 0.008253407 2.737212 2.840578 3.423688  0.6864764
#> 6        14 0.0157851334 0.006036395 3.244110 3.371635 3.638171  0.3940607
#>   scaled_mean_1 scaled_mean_2 scaled_mean_3 group_diff_scaled km
#> 1    -0.4997955    0.53497043     0.2966968         1.0347659  2
#> 2    -0.3093980    0.02013769     0.7168734         1.0262713  2
#> 3    -0.3509730    0.58915719    -0.1576207         0.9401302  3
#> 4    -0.1531394    0.49786781    -0.4815776         0.9794454  3
#> 5    -0.2321462   -0.08925855     0.7167982         0.9489443  2
#> 6    -0.2998939    0.02113042     0.6920903         0.9919842  2
get_signatures(res, k = 3, top_signatures = 100)
#> * 72/72 samples (in 3 classes) remain after filtering by silhouette (>= 0.5).
#> * cache hash: 9e0a60297e9e338085d22adb55750347 (seed 888).
#> * calculating row difference between subgroups by Ftest.
#> * split rows into 3 groups by k-means clustering.
#> * 101 signatures (2.5%) with most significant fdr, group_diff > 0.
#> * making heatmaps for signatures.