This document contains results for testing the number of samplings for consensus partitioning on the two datasets ( TCGA GBM microarray dataset and HSMM single cell RNASeq dataset). The numbers of random samplings were tested for 25, 50, 100 and 200. We tested both row sampling and column sampling. For each combination of parameters, cola ran for 100 times. The scripts for the analysis can be found here.
For each dataset, there are four plots:
- Scatter plots showing the variability of the consensus partitioning metrics. Three metrics (1-PAC, mean silhouette and the concordance scores) are tested.
- Line plots showing the mean concordance between the 100 cola runs.
- Barplots showing the mean concordance of the consensus parititons between 25 samplings and 200 samplings.
- Scatter plots showing the relation between 1-PAC under 25/200 samplings and the concordance.
TCGA GBM microarray dataset
Mean silhouette, by column
HSMM single cell RNASeq dataset
Mean silhouette, by column