Runing time for six partitioning methods (hclust
, kmeans
, pam
, mclust
, skmeans
and NMF
) on 206 GDS datasets and 223 recount2 datasets is visualized in Figure S8.1. Note cola analysis on each dataset was done with four CPU cores.
We also compared the running time for the consensus partitioning generated from sampling on matrix rows and columns. Figure S8.2 shows the results generated from TCGA GBM dataset. We set number of random samplings to 25, 50, 100 and 200 for both row sampling and column sampling. For each set of parameters (i.e. number of random samplings and which dimension on the matrix to apply), cola was run for 100 times to obtain the variability of running time. Figure S8.2A illustrates the running time for all six partitioning methods and Figure S8.2B illustrates the mean ratio of running time between column sampling and row sampling.
For TCGA GBM dataset, column sampling runs on average 1.45 times longer than row sampling.
Similar result on HSMM single-cell dataset is demonstrated in Figure S8.3. The average difference for column sampling against row sampling is 1.65 fold.