15 min read

Word cloud as heatmap annotation

I am recently developing a new package simplifyEnrichment which clusters GO terms into clusters and visualizes the summaries of GO terms in each cluster as word cloud. The results are visualized by ComplexHeatmap where the word clouds are the heatmap annotations. In this post, I will describe how to implement word clouds as the heatmap annotation by ComplexHeatmap.

To achieve this, we need two functionalities: one draws the word cloud and one links the word cloud to the corresponding rows in the heatmap. The former is done with the word_cloud_grob() function that I will describe later and the latter is done with anno_link() function which is already defined in ComplexHeatmap package.

The word_cloud_grob() function is from simplifyEnrichment package and it, as well as several related functions, can be directly sourced from this Gist by the following command:

source("https://gist.githubusercontent.com/jokergoo/bfb115200df256eeacb7af302d4e508e/raw/14f315c7418f3458d932ad749850fd515dec413b/word_cloud_grob.R")

There are four functions defined in the script:

  • word_cloud_grob(): The main function that constructs the word cloud grob.
  • widthDetails.word_cloud(): Helper function which returns the width of the word cloud grob by grobWidth() function.
  • heightDetails.word_cloud(): Helper function which returns the height of the word cloud grob by grobHeight() function.
  • scale_fontsize(): This function maps the word frequency to font size.

In the following parts of this post, I first describe how to set different parameters for constructing the word cloud grob, then demonstrate how to link the word cloud to the heatmap. Finally I reproduce the GO similarity heatmap that is generated with simplifyEnrichment package.

Word cloud grob

To demonstrate word_cloud_grob() function, I randomly generate a vector of words and their font sizes.

set.seed(123)
words = sapply(1:30, function(x) strrep(sample(letters, 1), sample(3:10, 1)))
fontsize = runif(30, min = 5, max = 30)

The words and the corresponding font sizes should be specified as in following code.

library(grid)
gb = word_cloud_grob(words, fontsize = fontsize, max_width = unit(100, "mm"))
grid.newpage()
grid.draw(gb)
grid.rect(width = grobWidth(gb), height = grobHeight(gb), gp = gpar(fill = NA))

The word cloud is very basic. The words are ordered by the font sizes and are placed from the bottom to top. Words are assigned with random colors. There is a box that contains the word cloud. Here max_width argument controls the “maximal width” of the box. Note the final grob width returned by grobWidth(gb) might be a little bit smaller than the value specified with max_width because if the next word exceeds the box, it will be places into the next line in the box. I draw the border of the grob explicitly so that you can see the size of the grob.

If the width of the box changes, the height of the box changes accordingly. In the following example, the width is reduced to 60mm and you can see the box gets higher.

gb = word_cloud_grob(words, fontsize = fontsize, max_width = unit(60, "mm"))
grid.newpage()
grid.draw(gb)
grid.rect(width = grobWidth(gb), height = grobHeight(gb), gp = gpar(fill = NA))

The colors of words help to distinguish between the words. The col argument can be set as a vector with the same length as the words.

# color as a vector
gb = word_cloud_grob(words, fontsize = fontsize, max_width = unit(100, "mm"), col = 1:30)
grid.newpage(); grid.draw(gb)
grid.rect(width = grobWidth(gb), height = grobHeight(gb), gp = gpar(fill = NA))

col can also be set as a color mapping function that maps from font sizes to colors. The color mapping function takes the fontsize vector as input and returns the corresponding colors.

# color as a function
library(circlize)
col_fun = colorRamp2(c(5, 17, 30), c("blue", "black", "red"))
gb = word_cloud_grob(words, fontsize = fontsize, max_width = unit(100, "mm"), 
    col = col_fun)
grid.newpage(); grid.draw(gb)
grid.rect(width = grobWidth(gb), height = grobHeight(gb), gp = gpar(fill = NA))

Other arguments in word_cloud_grob are line_space that controls the space between lines and word_space that controls the space between words.

As a heatmap annotation

anno_link() function links a subset of rows in the heatmap to an external viewport. The viewport that links to the heatmap should have fixed width and height. If the word cloud is taken as the annotation, the size of the viewport relates to the size of the word cloud grob. In the next code, I generated a word cloud grob and calcualte their height and width.

gb = word_cloud_grob(words, fontsize = fontsize, max_width = unit(100, "mm"))
gb_h = grobHeight(gb)
gb_w = grobWidth(gb)

I generate a random heatmap with 10 rows and 10 columns without row clustering, so that I can correspond the word cloud to the first three rows in the heatmap.

library(ComplexHeatmap)
m = matrix(rnorm(100), 10)
ht = Heatmap(m, cluster_rows = FALSE)

What to draw in the “linking viewport” should be manually defined. In the case here, I simply draw the word cloud by grid.draw(gb). I additionally draw the background of the word cloud in light grey.

panel_fun = function(index, nm) {
    grid.rect(gp = gpar(fill = "#EEEEEE", col = NA))
    grid.draw(gb)
}

anno_link() is set as follows. Note I also set the background color for the “linking line” same as the background of the word cloud.

ht + rowAnnotation(word_cloud = anno_link(align_to = 1:3, which = "row", 
        panel_fun = panel_fun, size = gb_h, 
        width = gb_w + unit(5, "mm"), # the link is 5mm
        link_gp = gpar(fill = "#EEEEEE", col = NA)
    ))

Ok, this is the simplest implementation of the word cloud annotation. To make it look nicer, you need to do a lot of manual adjustment. In the next section, I demonstrate how to make a nicer plot.

A real-world example

In this section, I reproduce the plot generated from simplifyEnrichment package. First I load some data objects:

tmp_file = tempfile()
download.file("https://jokergoo.github.io/word_cloud_annotation_example.RData", 
    destfile = tmp_file, quiet = TRUE)
load(tmp_file); file.remove(tmp_file)

There are following three objects:

  • mat A GO similarity matrix from 500 GO terms. Values are between 0 and 1 where 1 means exactly the same and 0 means completely different.
  • cl The clustering labels of the 500 GO terms.
  • keywords A list of keywords and their frequencies extracted from the GO names in the corresponding GO cluster.

The data structure or the values of the three objects are as follows:

mat[1:6, 1:6]
cl
keywords

I first define the similarity heatmap. The settings can be very straightforwardly understood from the argument names.

ht = Heatmap(mat, col = colorRamp2(c(0, 1), c("white", "red")),
    name = "Similarity",
    show_row_names = FALSE, show_column_names = FALSE,
    show_row_dend = FALSE, show_column_dend = FALSE,
    row_split = cl, column_split = cl, 
    border = "#404040", row_title = NULL, column_title = NULL,
    row_gap = unit(0, "mm"), column_gap = unit(0, "mm"))

There will be word clouds for every GO cluster, and since the GO terms are split by cl, the “alignment variable” is defined as follows. The GO cluster with label “0” is removed.

align_to = split(seq_len(nrow(mat)), cl)
align_to = align_to[names(align_to) != "0"]
align_to = align_to[names(align_to) %in% names(keywords)]
align_to

Next I construct a list of word cloud grob. Note I use scale_fontsize() to map word frequency to font size.

fontsize_range = c(4, 16)
gbl = lapply(names(align_to), function(nm) {
    kw = keywords[[nm]][, 1]
    freq = keywords[[nm]][, 2]
    fontsize = scale_fontsize(freq, rg = c(1, max(10, freq)), fs = fontsize_range)

    word_cloud_grob(text = kw, fontsize = fontsize)
})
names(gbl) = names(align_to)
gbl

I calculate the height of each word cloud and use the maximal width of all grob as the width of the “linking annotation”. The heights and widths are added with 8pt as margins.

margin = unit(8, "pt")
gbl_h = lapply(gbl, function(x) convertHeight(grobHeight(x), "cm") + margin)
gbl_h = do.call(unit.c, gbl_h)

gbl_w = lapply(gbl, function(x) convertWidth(grobWidth(x), "cm"))
gbl_w = do.call(unit.c, gbl_w)
gbl_w = max(gbl_w) + margin

The use of convertHeight() and convertWidth() is mainly for simplying the unit objects.

panel_fun is defined as follows. The use of grid.lines() only draws the bottom, right and top border of the viewport.

panel_fun = function(index, nm) {
    # background
    grid.rect(gp = gpar(fill = "#DDDDDD", col = NA))
    # border
    grid.lines(c(0, 1, 1, 0), c(0, 0, 1, 1), gp = gpar(col = "#AAAAAA"), 
        default.units = "npc")
    gb = gbl[[nm]]
    # a viewport within the margins
    pushViewport(viewport(x = margin/2, y = margin/2, 
        width = grobWidth(gb), height = grobHeight(gb),
        just = c("left", "bottom")))
    grid.draw(gb)
    popViewport()
}

And the final heatmap with the word cloud annotations are as follows.

ht = ht + rowAnnotation(keywords = anno_link(align_to = align_to, 
    which = "row", panel_fun = panel_fun, 
    size = gbl_h, gap = unit(2, "mm"), 
    width = gbl_w + unit(5, "mm"), # 5mm for the link
    link_gp = gpar(fill = "#DDDDDD", col = "#AAAAAA"), 
    internal_line = FALSE)) # you can set it to TRUE to see what happens
draw(ht, ht_gap = unit(2, "pt"))