4 min read

# Which heatmap function is faster?

In this post I test the performance (the running time) of four heatmap functions: `gplots::heatmap.2()`, `heatmap()` which is natively supported in R, `ComplexHeatmap::Heatmap()` and `pheatmap::pheatmap()`.

We generate a 1000x1000 random matrix.

``````library(ComplexHeatmap)
library(pheatmap)
library(gplots)
library(microbenchmark)

set.seed(123)
n = 1000
mat = matrix(rnorm(n*n), nrow = n)``````

First I test drawing heatmaps as well as drawing dendrograms (with applying clustering):

``````t1 = microbenchmark(
"heatmap()" = {
pdf(NULL)
heatmap(mat)
dev.off()
},
"heatmap.2()" = {
pdf(NULL)
heatmap.2(mat, trace = "none")
dev.off()
},
"Heatmap()" = {
pdf(NULL)
draw(Heatmap(mat))
dev.off()
},
"pheatmap()" = {
pdf(NULL)
pheatmap(mat)
dev.off()
},
times = 5
)
print(t1, unit = "s")``````
``````## Unit: seconds
##         expr   min    lq  mean median    uq   max neval
##    heatmap() 15.93 16.03 17.05  16.13 17.25 19.90     5
##  heatmap.2() 16.15 17.06 17.09  17.19 17.38 17.69     5
##    Heatmap() 20.75 21.55 22.27  21.90 21.96 25.17     5
##   pheatmap() 15.66 15.89 19.77  16.21 16.64 34.44     5``````

The running time for all four heatmap functions looks similar, it might due to that clustering uses most of the running time. `Heatmap()` runs the longest, perhaps because `Heatmap()` applies additional manipulations on the dendrograms such as dendrogram reordering.

Next I suppress the clustering on both rows and columns and with no dendrogram.

``````t2 = microbenchmark(
"heatmap()" = {
pdf(NULL)
heatmap(mat, Rowv = NA, Colv = NA)
dev.off()
},
"heatmap.2()" = {
pdf(NULL)
heatmap.2(mat, dendrogram = "none", trace = "none")
dev.off()
},
"Heatmap()" = {
pdf(NULL)
draw(Heatmap(mat, cluster_rows = FALSE, cluster_columns = FALSE))
dev.off()
},
"pheatmap()" = {
pdf(NULL)
pheatmap(mat, cluster_rows = FALSE, cluster_cols = FALSE)
dev.off()
},
times = 5
)
print(t2, unit = "s")``````
``````## Unit: seconds
##         expr     min     lq    mean  median      uq     max neval
##    heatmap()  0.2546  0.266  0.3192  0.2683  0.3141  0.4931     5
##  heatmap.2() 15.0519 15.315 15.3524 15.4163 15.4787 15.5001     5
##    Heatmap()  2.7637  2.841  2.9421  2.9303  2.9693  3.2059     5
##   pheatmap()  1.1940  1.225  4.3730  1.2677  1.3535 16.8250     5``````

Now `heatmap.2()` now is the slowest if only draw the heatmap bodies.

Next I perform clustering in advance and send the clustering objects to the heatmap functions. In this setting, dendrograms are also drawn along with the heatmaps.

``````row_hc = hclust(dist(mat))
col_hc = hclust(dist(t(mat)))``````
``````t3 = microbenchmark(
"heatmap()" = {
pdf(NULL)
heatmap(mat, Rowv = as.dendrogram(row_hc), Colv = as.dendrogram(col_hc))
dev.off()
},
"heatmap.2()" = {
pdf(NULL)
heatmap.2(mat, Rowv = row_hc, Colv = col_hc, trace = "none")
dev.off()
},
"Heatmap()" = {
pdf(NULL)
draw(Heatmap(mat, cluster_rows = row_hc, cluster_columns = col_hc))
dev.off()
},
"pheatmap()" = {
pdf(NULL)
pheatmap(mat, cluster_rows = row_hc, cluster_cols = col_hc)
dev.off()
},
times = 5
)
print(t3, unit = "s")``````
``````## Unit: seconds
##         expr    min     lq   mean median     uq    max neval
##    heatmap()  1.462  1.473  1.503  1.475  1.506  1.599     5
##  heatmap.2() 15.864 15.888 16.165 16.163 16.327 16.585     5
##    Heatmap()  5.777  5.803  5.956  6.003  6.066  6.130     5
##   pheatmap()  1.308  1.321  4.413  1.488  1.544 16.406     5``````

Finally I put the mean running time into a table for easy comparison:

 `heatmap()` `heatmap.2()` `Heatmap()` `pheatmap()` do clustering, draw dendrograms `17.05s` `17.09s` `22.27s` `19.77s` no clusteirng, no dendrogram `0.32s` `15.35s` `2.94s` `4.37s` only draw dendrograms `1.50s` `16.17s` `5.96s` `4.41s`

The following plots illustrate the mean running time for the four matrices with different dimensions.

Session info:

``sessionInfo()``
``````## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.5
##
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods
## [8] base
##
## other attached packages:
## [1] cowplot_1.1.0             ggplot2_3.3.2
## [3] microbenchmark_1.4-7      gplots_3.1.0
## [5] pheatmap_1.0.12           ComplexHeatmap_2.7.1.1003
## [7] GetoptLong_1.0.4          knitr_1.30
##
## loaded via a namespace (and not attached):
##  [1] circlize_0.4.12.1004 shape_1.4.5          gtools_3.8.2
##  [4] tidyselect_1.1.0     xfun_0.19            purrr_0.3.4
##  [7] colorspace_2.0-0     vctrs_0.3.4          generics_0.1.0
## [10] htmltools_0.5.0      stats4_4.0.2         yaml_2.2.1
## [13] rlang_0.4.8          pillar_1.4.6         withr_2.3.0
## [16] glue_1.4.2           BiocGenerics_0.34.0  RColorBrewer_1.1-2
## [19] matrixStats_0.57.0   lifecycle_0.2.0      stringr_1.4.0
## [22] munsell_0.5.0        blogdown_0.17        gtable_0.3.0
## [25] GlobalOptions_0.1.2  caTools_1.18.0       evaluate_0.14
## [28] labeling_0.4.2       IRanges_2.22.2       Cairo_1.5-12.2
## [31] parallel_4.0.2       highr_0.8            Rcpp_1.0.5
## [34] KernSmooth_2.23-18   scales_1.1.1         S4Vectors_0.26.1
## [37] magick_2.5.2         farver_2.0.3         rjson_0.2.20
## [40] png_0.1-7            digest_0.6.27        stringi_1.5.3
## [43] bookdown_0.21        dplyr_1.0.2          clue_0.3-57
## [46] tools_4.0.2          bitops_1.0-6         magrittr_2.0.1
## [49] tibble_3.0.4         cluster_2.1.0        crayon_1.3.4
## [52] pkgconfig_2.0.3      ellipsis_0.3.1       rmarkdown_2.5
## [55] R6_2.5.0             compiler_4.0.2``````