This document contains results for comparing row and column sampling for consensus partitioning on the five datasets ( Golub leukemia dataset, Ritz ALL dataset and TCGA GBM microarray dataset), HSMM single cell RNASeq dataset and MCF10CA single cell RNASeq dataset. For each dataset, four consensus partitioning methods (SD:hclust
, SD:skmeans
, ATC:hclust
and ATC:skmeans
) were applied, and each method ran for 100 times so that the variability of 1-PAC can be captured. The random sampling was done by rows and by columns. Each individual cola run was done with default parameters. The scripts for the analysis can be found here.
For each dataset, there are four plots:
- boxplots that show the distributions of 1-PAC scores at each k (number of subgroups) for each method.
- mean difference of the 1-PAC score between row-sampling and column-sampling.
- heatmaps that directly show the partitions from 100 runs. Each row corresponds to one cola run and the color in the heatmap only corresponds to the subgroup labels, while not the stability of the partitioning in that run.
- barplots that show the concordance of the partitions in 100 runs for the row-sampling or for the column-sampling separately, as well as the concordance between row-sampling and column-sampling. Note the scale on y-axes is transformed as \(1 - \sqrt{1-y}\).
TCGA GBM microarray dataset
HSMM single cell RNASeq dataset
MCF10CA single cell RNASeq dataset