In this document, we will discuss the use of background regions. We first demonstrate it with a ChIP-seq TFBS dataset from UCSC table browser. Parameters are:
In the “Select dataset” section:
clade = Mammal
genome = Human
assembly = GRCh37/hg19
group = Regulation
track = ENCODE 3 TFBS
table: GM12878 MYB
And in the “Retrieve and display data” section:
output format = BED - browser extensible data
Then click the button “get output”.
We first read it as a GRanges
object.
library(rGREAT)
df = read.table("data/tb_encTfChipPkENCFF215YWS_GM12878_MYB_hg19.bed")
df = df[df[, 1] %in% paste0("chr", c(1:22, "X", "Y")), ]
gr = GRanges(seqnames = df[, 1], ranges = IRanges(df[, 2] + 1, df[, 3]))
The next two GREAT analysis uses the whole genome as background and excludes gap regions.
res1 = great(gr, "GO:BP", "hg19", exclude = NULL)
res2 = great(gr, "GO:BP", "hg19", exclude = "gap")
And we compare the significant GO terms:
tb1 = getEnrichmentTable(res1)
tb2 = getEnrichmentTable(res2)
library(eulerr)
lt = list(
genome = tb1$id[tb1$p_adjust < 0.001],
exclude_gap = tb2$id[tb2$p_adjust < 0.001]
)
plot(euler(lt), quantities = TRUE)