Supplementary S4. Compare sequence conservation between zebrafish and mouse to human

Author: Zuguang Gu ( z.gu@dkfz.de )

Date: 2016-02-27


The conserved regions on human chromosome 1 between mouse and human or between zebrafish and human were downloaded from UCSC Table Browser. The parameters for downloading were:

clade: Mammal
genome: Human
assembly: Feb. 2009(GRCh37/hg19)
group: Comparative Genomics
# for mouse
track: Placental Chain/Net
table: Mouse Net (netMm10)
# for zebrafish
track: Vertebrate Chain/Net
table: Zebrafish Net(netDanRer7)

The “net” alignment which is used here shows the best alignment (the alignment algorithm allows longer gaps than traditional affine gap scoring systems) between each part in the human genome and other genomes. For a detailed description of the alignment strategy, please go to UCSC Table Browser, and click the describe table schema button.

In the following example, the conservation data is mapped to a level 6 Hilbert curve under “normal” mode. Points are used as the graphic and each segment on the Hilbert curve is split by 3 points. In this case, each point on the curve represents approximately 30kb. Only chromosome 1 is visualized.

The point is fully red if the window which is represented by this point is completely covered by conserved regions, and it is yellow if it is not covered by any conserved region. Colors are linearly interpolated between red and yellow if the window is partially covered by conserved regions.

library(HilbertCurve)
library(ComplexHeatmap)
library(circlize)

# for generating the legend
cm = ColorMapping(col_fun = colorRamp2(c(0, 1), c("yellow", "red")))
lgd = color_mapping_legend(cm, title = "Conservation", plot = FALSE,
    at = c(0, 0.2, 0.4, 0.6, 0.8, 1), 
    labels = c("0%", "20%", "40%", "60%", "80%", "100%"))
chr1_len = 249250621

Conservation between human and mouse.

load(system.file("extdata", "mouse_net.RData", package = "HilbertCurve"))
seqlengths(mouse) = chr1_len # it is only used to extract the complement
nonmouse = gaps(mouse); nonmouse = nonmouse[strand(nonmouse) == "*"]
gr = c(mouse, nonmouse)
col = c(rep("red", length(mouse)), rep("yellow", length(nonmouse)))
hc = GenomicHilbertCurve(chr = "chr1", level = 6, 
    title = "Sequence conservation between mouse and human on chr1",
    legend = lgd)
hc_points(hc, gr, np = 3, gp = gpar(col = NA, fill = col))

plot of chunk unnamed-chunk-3

Conservation between human and zebrafish.

load(system.file("extdata", "zebrafish_net.RData", package = "HilbertCurve"))
seqlengths(zebrafish) = chr1_len
nonzebrafish = gaps(zebrafish); nonzebrafish = nonzebrafish[strand(nonzebrafish) == "*"]
gr = c(zebrafish, nonzebrafish)
col = c(rep("red", length(zebrafish)), rep("yellow", length(nonzebrafish)))
hc = GenomicHilbertCurve(chr = "chr1", level = 6, 
    title = "Sequence conservation between zebrafish and human on chr1",
    legend = lgd)
hc_points(hc, gr, np = 3, gp = gpar(col = NA, fill = col))

plot of chunk unnamed-chunk-4

Comparing these two plots allows the following conclusions:

  1. With the exception of few regions, almost all parts of the human chromosome 1 have a conserved counterpart in the mouse genome.
  2. The start and end of the human chromosome 1 is less conserved to mouse compared to other parts in the chromosome. Also there are several large less conserved regions inside the human chromosome.
  3. Between human chromosome 1 and zebrafish, there are several large conserved regions, while other large regions are not conserved. Only relatively few smaller conserved regions exist.

The idea of visualizing sequencing conservation through the Hilbert curve is from http://mkweb.bcgsc.ca/hilbert/scientificamerican.mhtml.

Session info

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
## [1] C
## 
## attached base packages:
##  [1] stats4    parallel  methods   grid      stats     graphics  grDevices
##  [8] utils     datasets  base     
## 
## other attached packages:
## [1] circlize_0.3.4       ComplexHeatmap_1.6.0 HilbertCurve_1.1.3  
## [4] GenomicRanges_1.22.3 GenomeInfoDb_1.6.3   IRanges_2.4.6       
## [7] S4Vectors_0.9.19     BiocGenerics_0.16.1 
## 
## loaded via a namespace (and not attached):
##  [1] whisker_0.3-2       knitr_1.12.3        XVector_0.10.0     
##  [4] magrittr_1.5        zlibbioc_1.16.0     colorspace_1.2-6   
##  [7] lattice_0.20-33     rjson_0.2.15        stringr_1.0.0      
## [10] tools_3.2.2         png_0.1-7           RColorBrewer_1.1-2 
## [13] formatR_1.2.1       HilbertVis_1.28.0   GlobalOptions_0.0.8
## [16] dendextend_1.1.2    shape_1.4.2         evaluate_0.8       
## [19] stringi_1.0-1       GetoptLong_0.1.1