Running Time of Different Partitioning Methods

Runing time for six partitioning methods (hclust, kmeans, pam, mclust, skmeans and NMF) on 206 GDS datasets and 223 recount2 datasets is visualized in Figure S8.1. Note cola analysis on each dataset was done with four CPU cores.

Figure S8.1. Running time of six partitioning methods on GDS and recount2 datasets. Y-axis is in log10 scale.

We also compared the running time for the consensus partitioning generated from sampling on matrix rows and columns. Figure S8.2 shows the results generated from TCGA GBM dataset. We set number of random samplings to 25, 50, 100 and 200 for both row sampling and column sampling. For each set of parameters (i.e. number of random samplings and which dimension on the matrix to apply), cola was run for 100 times to obtain the variability of running time. Figure S8.2A illustrates the running time for all six partitioning methods and Figure S8.2B illustrates the mean ratio of running time between column sampling and row sampling.

For TCGA GBM dataset, column sampling runs on average 1.45 times longer than row sampling.

Figure S8.2. Running time of consensus partitioning generated by matrix row sampling and column sampling, on TCGA GBM dataset.

Similar result on HSMM single-cell dataset is demonstrated in Figure S8.3. The average difference for column sampling against row sampling is 1.65 fold.

Figure S8.3. Running time of consensus partitioning generated by matrix row sampling and column sampling, on HSMM single-cell dataset.

Running Time of Different Partitioning Methods

Zuguang Gu (z.gu@dkfz.de)

2020-07-07