The dataset is from project (https://www.ebi.ac.uk/ena/data/view/PRJEB26737). Only the samples in (https://www.ebi.ac.uk/ena/data/view/ERS2487269) and (https://www.ebi.ac.uk/ena/data/view/ERS2487270) are used.
The raw reads are processed by STAR and htseq-count. Genes with low counts as well as non-protein coding genes are filtered out. The TPM table is MCF10CA_scRNAseq_tpm.rds.
RDS files generated by cola (use readRDS()
to load into R (>= 3.6.0)):
HTML reports for cola analysis:
Following code performs the analysis.
Prepare the input matrix:
library(cola)
tpm = readRDS("MCF10CA_scRNAseq_tpm.rds")
m = log2(tpm + 1)
cell_type = ifelse(grepl("round", colnames(m)), "round", "aberrant")
cell_col = cell_type = c("aberrant" = "red", "round" = "blue")
m = adjust_matrix(m)
Perform the consensus partitioning:
register_NMF()
set.seed(123)
rl = run_all_consensus_partition_methods(
m,
mc.cores = 4,
anno = data.frame(cell_type = cell_type),
anno_col = list(cell_type = cell_col)
)
saveRDS(rl, file = "MCF10CA_scRNAseq_subgroup.rds")
cola_report(rl, output_dir = "MCF10CA_scRNAseq_subgroup_cola_report", mc.cores = 4)