Supplementary S5. Visualize different histone modifications

Author: Zuguang Gu ( z.gu@dkfz.de )

Date: 2016-02-27


Different modifications (e.g. methylation, acetylation) of the histones affect the chromatin-DNA interaction, and thus play a role in transcription regulation. Different types of modifications relate to different types of gene regulation. For example, the H3H4me3 modification is found in actively transcribed promoters. ChIP-sequencing (ChIP-Seq) utilizes highthroughput DNA sequencing technology for genome-wide assessment of chromatin states and thus provides a way to study gene regulation in a global aspect.

In this supplementary, we visualize four types of histone modifications (H3K27ac, H3K36me3, H3K4me3, H3K9me3). The ChIP-Seq data are downloaded from Roadmap (http://genboree.org/EdaccData/Release-9/sample-experiment/Lung/ The bed files are alignments of reads, you need to convert to the basepair or window-based coverage before using following R commands). The following files for the four histone modifications are used for visualization:

UCSD.Lung.H3K27ac.STL002.bed
UCSD.Lung.H3K36me3.STL002.bed
UCSD.Lung.H3K4me3.STL002.bed
UCSD.Lung.H3K9me3.STL002.bed

First we wrap the code which makes the plot as a function so that we can use it repeatedly.

library(HilbertCurve)
library(circlize)
library(GetoptLong)
library(ComplexHeatmap)

# RefSeq genes
load(system.file("extdata", "refseq_chr1.RData", package = "HilbertCurve"))

plot_histone_mark = function(mark) {
    df = read.table(pipe(qq("awk '$5>0 && $1==\"chr1\"' ~/HilbertCurveTest/UCSD.Lung.@{mark}.STL002.bed")), sep = "\t")
    col_fun = colorRamp2(c(0, quantile(df[[5]], 0.99)), c("white", "red"))
    col_fun_new = colorRamp2(c(0, quantile(df[[5]], 0.99)), c("white", "purple"))
    cm = ColorMapping(col_fun = col_fun)
    lgd = color_mapping_legend(cm, title = "Intensity", plot = FALSE)
    cm2 = ColorMapping(level = c(mark, "overlap", "gene"), color = c("#FF0000", "purple", "#CCCCCC"))
    lgd2 = color_mapping_legend(cm2, title = "Layer", plot = FALSE)

    hc = GenomicHilbertCurve(chr = "chr1", level = 9, mode = "pixel",
        title = qq("Intensity of @{mark} mark on chr1"), legend = list(lgd, lgd2))
    hc_layer(hc, df, col = col_fun(df[[5]]))
    hc_layer(hc, reduce(g), col = "#00000010",
        overlay = function(r0, g0, b0, r, g, b, alpha) {
            l = !(r0 == 1 & g0 == 1 & b0 == 1)
            v = col2value(r0[l], g0[l], b0[l], col_fun = col_fun)

            col_new = col_fun_new(v, return_rgb = TRUE)
            r0[l] = col_new[, 1]
            g0[l] = col_new[, 2]
            b0[l] = col_new[, 3]

            default_overlay(r0, g0, b0, r, g, b, alpha)
        })

    # Alternatively, the Hilbert curve can be saved as PNG file instead of a raster image
    # hc_png(hc, file = qq("hc_@{mark}_chr1.png"))
}

In the above code, intensities of histone modifications are mapped to a level 9 Hilbert curve so that each pixel approximately represents a window of 950bp. In each plot, there is an additional grey layer which represents the regions of genes (RefSeq genes). For the regions which are overlapped with gene bodies, the color theme is changed to white-purple, which helps to visualize the correspondance between histone modifications and gene bodies.

H3K27ac is a mark for active enhancers which is narrow and sharp. The Hilbert curve shows this attributes and also shows that enhancers may exist in form of clusters.

plot_histone_mark("H3K27ac")

plot of chunk unnamed-chunk-3

H3K36me3 is a mark for actively transcribed gene bodies. The Hilbert curve shows that basically most of H3K36me3 modification with high intensity exists in gene bodies (the purple regions).

plot_histone_mark("H3K36me3")

plot of chunk unnamed-chunk-4

H3K4me3 is a mark for TSS of actively transcribed genes which is also narrow and sharp. In addition, the Hilbert curve shows that the size of these marks are quite consistent.

plot_histone_mark("H3K4me3")

plot of chunk unnamed-chunk-5

H3K9me3 is a mark for repressed genes. The Hilbert curve shows that H3K9me3 modications spread over huge regions in the chromosome.

plot_histone_mark("H3K9me3")

plot of chunk unnamed-chunk-6

Session info

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
## [1] C
## 
## attached base packages:
##  [1] stats4    parallel  methods   grid      stats     graphics  grDevices
##  [8] utils     datasets  base     
## 
## other attached packages:
## [1] circlize_0.3.4       ComplexHeatmap_1.6.0 GetoptLong_0.1.1    
## [4] HilbertCurve_1.1.3   GenomicRanges_1.22.3 GenomeInfoDb_1.6.3  
## [7] IRanges_2.4.6        S4Vectors_0.9.19     BiocGenerics_0.16.1 
## 
## loaded via a namespace (and not attached):
##  [1] whisker_0.3-2       knitr_1.12.3        XVector_0.10.0     
##  [4] magrittr_1.5        zlibbioc_1.16.0     colorspace_1.2-6   
##  [7] lattice_0.20-33     rjson_0.2.15        stringr_1.0.0      
## [10] tools_3.2.2         png_0.1-7           RColorBrewer_1.1-2 
## [13] formatR_1.2.1       HilbertVis_1.28.0   GlobalOptions_0.0.8
## [16] dendextend_1.1.2    shape_1.4.2         evaluate_0.8       
## [19] stringi_1.0-1