Chapter 16 A complex example of Chord diagram

In this chapter, we demonstrate how to make a complex Chord diagram and how to customize additional tracks by visualizing chromatin state transitions as well as methylation changes. A chromatin state transition matrix shows how much a chromatin state in the genome has been changed from e.g. one group of samples to the other. The genomic regions for which the chromatin states change also have variable methylation patterns which may show interesting correspondance to chromatin states change.

The data used in this post is processed from Roadmap dataset. The chromatin states are learned from five core chromatin marks. Roadmap samples are classified into two groups based on expression profile. In each group, a chromatin state is assigned to the corresponding genomic bin if it is recurrent in at least half of the samples in each group.

The processed data can be set from https://jokergoo.github.io/circlize_book/data/chromatin_transition.RData.

load("data/chromatin_transition.RData")

In the RData file, there are three matrix: mat, meth_mat_1 and meth_mat_2 which are:

  • mat: chromatin state transition matrix. Rows correspond to states in group 1 and columns correspond to group 2. The value in the matrix are total base pairs that transite from one state to the other. E.g. mat["TssA", "TxFlnk"] is the total base pairs that have “TssA” state in samples in group 1 and transites to “TxFlnk” state in samples in group 2. On the diagonal are the regions where the states have not been changed in the two groups.
  • meth_mat_1: mean methylation for each set of regions in group 1. E.g. meth_mat_1["TssA", "TxFlnk"] is the mean methylation for the regions in group 1 that have “TssA” state in group 1 and “TxFlnk” state in group 2.
  • meth_mat_2: mean methylation for each set of regions in group 2. E.g. meth_mat_2["TssA", "TxFlnk"] is the mean methylation for the regions in group 2 that have “TssA” state in group 1 and “TxFlnk” state in group 2.
mat[1:4, 1:4]
##            TssA TssAFlnk TxFlnk     Tx
## TssA     497200    79600  13400   1800
## TssAFlnk  56400   233200   5000    800
## TxFlnk        0      400  43000   1800
## Tx          800      200    200 166400
meth_mat_1[1:4, 1:4]
##               TssA  TssAFlnk    TxFlnk        Tx
## TssA     0.1647232 0.1580874 0.1917435 0.2690045
## TssAFlnk 0.2591677 0.2689880 0.3616242 0.3411387
## TxFlnk          NA 0.3697514 0.3360386 0.4752722
## Tx       0.8268626 0.7822987 0.5799682 0.6595322

Normally, in majority in the genome, chromatin states of regions are not changed in the two groups, thus, we should only look at the regions in which the states are changed.

# proportion of the unchanges states in the genome
sum(diag(mat))/sum(mat)
## [1] 0.6192262
# remove the unchanged states
diag(mat) = 0

When making the plot, actually rows and columns are different (because one is from group 1 and the other is from group 2), thus we give them different names and the original names are stored in all_states.

all_states = rownames(mat)
n_states = nrow(mat)

rownames(mat) = paste0("R_", seq_len(n_states))
colnames(mat) = paste0("C_", seq_len(n_states))

dimnames(meth_mat_1) = dimnames(mat)
dimnames(meth_mat_2) = dimnames(mat)

Next we set the colors. colmat is the color of the links and the colors are represent as hexadecimal code. Links have more transparent (A0) if they contain few transitions (< 70th percentile) because we don’t want it to disturb the visualization of the major transitions.

state_col = c("TssA" = "#E41A1C",    "TssAFlnk" = "#E41A1C",
              "TxFlnk" = "#E41A1C",  "Tx" = "#E41A1C",
              "TxWk" = "#E41A1C",    "EnhG" = "#E41A1C",
              "Enh" = "#E41A1C",     "ZNF/Rpts" = "#E41A1C",
              "Het" = "#377EB8",     "TssBiv" = "#377EB8",
              "BivFlnk" = "#377EB8", "EnhBiv" = "#377EB8",
              "ReprPC" = "#377EB8",  "ReprPCWk" = "#377EB8",
              "Quies" = "black")

# one for rows and one for columns
state_col2 = c(state_col, state_col)
names(state_col2) = c(rownames(mat), colnames(mat))

colmat = rep(state_col2[rownames(mat)], n_states)
colmat = rgb(t(col2rgb(colmat)), maxColorValue = 255)

qati = quantile(mat, 0.7)
colmat[mat > qati] = paste0(colmat[mat > qati], "A0")
colmat[mat <= qati] = paste0(colmat[mat <= qati], "20")
dim(colmat) = dim(mat)

Now we can use chordDiagram() function to make the plot. Here we set one pre-allocated track in which the methylation information will be added later. Also we only set annotationTrack to grid and the axes and sector labels will be customized in later code.

chordDiagram() returns a data frame which contains coordinates for all links.

circos.par(cell.padding = c(0, 0, 0, 0), points.overflow.warning = FALSE)

cdm_res = chordDiagram(mat, col = colmat, grid.col = state_col2,
    directional = TRUE, annotationTrack = "grid", 
    big.gap = 10, small.gap = 1,
    preAllocateTracks = list(track.height = 0.1),
    link.target.prop = FALSE)

head(cdm_res)
##    rn  cn value1 value2 o1 o2      x1     x2       col
## 1 R_1 C_1      0      0 15 13  431200 267200 #E41A1C20
## 2 R_2 C_1  56400  56400 15 12  159800 267200 #E41A1CA0
## 3 R_3 C_1      0      0 15 11    3600 210800 #E41A1C20
## 4 R_4 C_1    800    800 15 10   34600 210800 #E41A1C20
## 5 R_5 C_1  98200  98200 15  9 1411600 210000 #E41A1CA0
## 6 R_6 C_1      0      0 15  8  139800 111800 #E41A1C20

Now the axes are added in the second track, also, the index of states are added at the center of the grids in the second track, if the degree for a sector is larger than 3 degrees. Note since there is already one pre-allocated track, the circular rectangles are in the second track (track.index = 2).

circos.track(track.index = 2, panel.fun = function(x, y) {
    if(abs(CELL_META$cell.start.degree - CELL_META$cell.end.degree) > 3) {
        sn = CELL_META$sector.index
        i_state = as.numeric(gsub("(C|R)_", "", sn))
        circos.text(CELL_META$xcenter, CELL_META$ycenter, i_state, col = "white", 
            font = 2, cex = 0.7, adj = c(0.5, 0.5), niceFacing = TRUE)
        xlim = CELL_META$xlim
        breaks = seq(0, xlim[2], by = 4e5)
        circos.axis(major.at = breaks, labels = paste0(breaks/1000, "KB"), labels.cex = 0.5)
    }
}, bg.border = NA)

On the top half, it is easy to see the proportion of different transitions in group 1 that come to every state in group 2. However, it is not straightforward for the states in the bottom half to see the proportion of different states in group 2 they transite to. This can be solved by adding small circular rectangles to represent the proportions. In following example, the newly added circular rectangles in the bottom half show e.g. how much the state 15 in group 1 has been transited to different states in group 2.

Note from version 0.4.11, there is a new argument link.target.prop in chordDiagram(). It basically automatically applies following code. However, we still suggest users to read the following code because more tracks with similar code will be drawn later.

for(i in seq_len(nrow(cdm_res))) {
    if(cdm_res$value1[i] > 0) {
        circos.rect(cdm_res[i, "x1"], -mm_y(1), 
            cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), -mm_y(2), 
            col = state_col2[cdm_res$cn[i]], border = state_col2[cdm_res$cn[i]],
            sector.index = cdm_res$rn[i], track.index = 2)
    }
}

Methylation in each category is put on the most outside of the circle. On this track, we will put two paralle rectangles which are mean methylation and methylation difference between group 1 and group 2. Basically, on the bottom, we show meth_mat_2 - meth_mat_1 and on the top we show meth_mat_1 - meth_mat_2.

The logic of following code is simple that it just simply adds rectangles repeatedly.

abs_max = quantile(abs(c(meth_mat_1, meth_mat_2) - 0.5), 0.95, na.rm = TRUE)
col_fun = colorRamp2(c(0.5 - abs_max, 0.5, 0.5 + abs_max), c("blue", "white", "red"))
col_fun2 = colorRamp2(c(-abs_max, 0, abs_max), c("green", "white", "orange"))

ylim = get.cell.meta.data("ylim", sector.index = rownames(mat)[1], track.index = 1)
y1 = ylim[1] + (ylim[2] - ylim[1])*0.4
y2 = ylim[2]
for(i in seq_len(nrow(cdm_res))) {
    if(cdm_res$value1[i] > 0) {
        circos.rect(cdm_res[i, "x1"], y1, cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), y1 + (y2-y1)*0.45, 
            col = col_fun(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]), 
            border = col_fun(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
            sector.index = cdm_res$rn[i], track.index = 1)

        circos.rect(cdm_res[i, "x1"], y1 + (y2-y1)*0.55, cdm_res[i, "x1"] - abs(cdm_res[i, "value1"]), y2, 
            col = col_fun2(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]), 
            border = col_fun2(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]]),
            sector.index = cdm_res$rn[i], track.index = 1)

        circos.rect(cdm_res[i, "x2"], y1, cdm_res[i, "x2"] - abs(cdm_res[i, "value1"]), y1 + (y2-y1)*0.45, 
            col = col_fun(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]), 
            border = col_fun(meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
            sector.index = cdm_res$cn[i], track.index = 1)

        circos.rect(cdm_res[i, "x2"], y1 + (y2-y1)*0.55, cdm_res[i, "x2"] - abs(cdm_res[i, "value1"]), y2, 
            col = col_fun2(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]), 
            border = col_fun2(meth_mat_1[cdm_res$rn[i], cdm_res$cn[i]] - meth_mat_2[cdm_res$rn[i], cdm_res$cn[i]]),
            sector.index = cdm_res$cn[i], track.index = 1)
    }
}
circos.clear()

Legends can be added according to instructions discussed in Section 4.