Local GREAT analysis

library(rGREAT)

Repo: https://github.com/jokergoo/jokergoo.github.io/tree/master/talks

We first read the regions (peaks of the transcription factor JUN) as a data frame.

bed = read.table("Encode3_JUN_hg19.bed")

We use standard Bioconductor data structure in local GREAT analysis, so we change bed to a GRanges object.

gr = GRanges(seqnames = bed[, 1], ranges = IRanges(bed[, 2], bed[, 3]))
gr

## GRanges object with 1726 ranges and 0 metadata columns:
##          seqnames            ranges strand
##             <Rle>         <IRanges>  <Rle>
##      [1]     chr1     936221-936347      *
##      [2]     chr1   1280252-1280492      *
##      [3]     chr1   2143913-2144153      *
##      [4]     chr1   4052186-4052426      *
##      [5]     chr1   4440698-4440938      *
##      ...      ...               ...    ...
##   [1722]    chr22 39096963-39097203      *
##   [1723]    chr22 39704855-39705095      *
##   [1724]    chr22 40894508-40894748      *
##   [1725]    chr22 43011250-43011490      *
##   [1726]    chr22 50364455-50364695      *
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

The function great() is the core function to perform local GREAT analysis. You can either use built-in annotations or use self-provided annotations.

Using integrated gene sets

The GO gene sets

great() supports GO gene sets for many organisms. The usage is:

res_bp = great(gr, "GO:BP", "hg19")
res_bp

## 1726 regions are associated to 2268 genes' extended TSSs.
##   TSS source: TxDb.Hsapiens.UCSC.hg19.knownGene
##   Genome: hg19
##   OrgDb: org.Hs.eg.db
##   Gene sets: GO:BP
##   Background: whole genome excluding gaps
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

The GO gene sets are from the organism org.*.db packages. TSS definitions are mainly from TxDb packages. All supported organisms are:

rGREAT:::BIOC_ANNO_PKGS

##    species_name            species_latin taxon_id
## 1         human             Homo sapiens     9606
## 2         human             Homo sapiens     9606
## 3         human             Homo sapiens     9606
## 4         human             Homo sapiens     9606
## 5         mouse             Mus musculus    10090
## 6         mouse             Mus musculus    10090
## 7         mouse             Mus musculus    10090
## 8         mouse             Mus musculus    10090
## 9           rat        Rattus norvegicus    10116
## 10          rat        Rattus norvegicus    10116
## 11          rat        Rattus norvegicus    10116
## 12          rat        Rattus norvegicus    10116
## 13      chicken            Gallus gallus     9031
## 14      chicken            Gallus gallus     9031
## 15      chicken            Gallus gallus     9031
## 16       rhesus           Macaca mulatta     9544
## 17       rhesus           Macaca mulatta     9544
## 18       rhesus           Macaca mulatta     9544
## 19         worm   Caenorhabditis elegans     6239
## 20         worm   Caenorhabditis elegans     6239
## 21          dog          Canis familiari     9615
## 22          dog          Canis familiari     9615
## 23          dog          Canis familiari     9615
## 24          pig               Sus scrofa     9823
## 25          pig               Sus scrofa     9823
## 26        yeast Saccharomyces cerevisiae     4932
## 27        yeast Saccharomyces cerevisiae     4932
## 28        chimp          Pan troglodytes     9598
## 29        chimp          Pan troglodytes     9598
## 30        chimp          Pan troglodytes     9598
## 31     fruitfly  Drosophila melanogaster     7227
## 32     fruitfly  Drosophila melanogaster     7227
## 33    zebrafish              Danio rerio     7955
## 34    zebrafish              Danio rerio     7955
## 35       cattle               Bos taurus     9913
## 36       cattle               Bos taurus     9913
## 37  thale cress     Arabidopsis thaliana     3702
## 38  thale cress     Arabidopsis thaliana     3702
## 39  thale cress     Arabidopsis thaliana     3702
## 40  thale cress     Arabidopsis thaliana     3702
##                                      txdb gene_id_in_txdb
## 1       TxDb.Hsapiens.UCSC.hg18.knownGene  Entrez Gene ID
## 2       TxDb.Hsapiens.UCSC.hg19.knownGene  Entrez Gene ID
## 3       TxDb.Hsapiens.UCSC.hg38.knownGene  Entrez Gene ID
## 4         TxDb.Hsapiens.UCSC.hg38.refGene  Entrez Gene ID
## 5      TxDb.Mmusculus.UCSC.mm10.knownGene  Entrez Gene ID
## 6        TxDb.Mmusculus.UCSC.mm10.ensGene Ensembl gene ID
## 7        TxDb.Mmusculus.UCSC.mm39.refGene  Entrez Gene ID
## 8       TxDb.Mmusculus.UCSC.mm9.knownGene  Entrez Gene ID
## 9       TxDb.Rnorvegicus.UCSC.rn4.ensGene Ensembl gene ID
## 10      TxDb.Rnorvegicus.UCSC.rn5.refGene  Entrez Gene ID
## 11      TxDb.Rnorvegicus.UCSC.rn6.refGene  Entrez Gene ID
## 12      TxDb.Rnorvegicus.UCSC.rn7.refGene  Entrez Gene ID
## 13      TxDb.Ggallus.UCSC.galGal4.refGene  Entrez Gene ID
## 14      TxDb.Ggallus.UCSC.galGal5.refGene  Entrez Gene ID
## 15      TxDb.Ggallus.UCSC.galGal6.refGene  Entrez Gene ID
## 16    TxDb.Mmulatta.UCSC.rheMac10.refGene  Entrez Gene ID
## 17     TxDb.Mmulatta.UCSC.rheMac3.refGene  Entrez Gene ID
## 18     TxDb.Mmulatta.UCSC.rheMac8.refGene  Entrez Gene ID
## 19        TxDb.Celegans.UCSC.ce11.refGene  Entrez Gene ID
## 20        TxDb.Celegans.UCSC.ce11.ensGene Ensembl gene ID
## 21  TxDb.Cfamiliaris.UCSC.canFam3.refGene  Entrez Gene ID
## 22  TxDb.Cfamiliaris.UCSC.canFam4.refGene  Entrez Gene ID
## 23  TxDb.Cfamiliaris.UCSC.canFam5.refGene  Entrez Gene ID
## 24     TxDb.Sscrofa.UCSC.susScr11.refGene  Entrez Gene ID
## 25      TxDb.Sscrofa.UCSC.susScr3.refGene  Entrez Gene ID
## 26  TxDb.Scerevisiae.UCSC.sacCer2.sgdGene     SGD Gene ID
## 27  TxDb.Scerevisiae.UCSC.sacCer3.sgdGene     SGD Gene ID
## 28 TxDb.Ptroglodytes.UCSC.panTro4.refGene  Entrez Gene ID
## 29 TxDb.Ptroglodytes.UCSC.panTro5.refGene  Entrez Gene ID
## 30 TxDb.Ptroglodytes.UCSC.panTro6.refGene  Entrez Gene ID
## 31    TxDb.Dmelanogaster.UCSC.dm3.ensGene Ensembl gene ID
## 32    TxDb.Dmelanogaster.UCSC.dm6.ensGene Ensembl gene ID
## 33      TxDb.Drerio.UCSC.danRer10.refGene  Entrez Gene ID
## 34      TxDb.Drerio.UCSC.danRer11.refGene  Entrez Gene ID
## 35      TxDb.Btaurus.UCSC.bosTau8.refGene  Entrez Gene ID
## 36      TxDb.Btaurus.UCSC.bosTau9.refGene  Entrez Gene ID
## 37    TxDb.Athaliana.BioMart.plantsmart51         TAIR ID
## 38    TxDb.Athaliana.BioMart.plantsmart22         TAIR ID
## 39    TxDb.Athaliana.BioMart.plantsmart25         TAIR ID
## 40    TxDb.Athaliana.BioMart.plantsmart28         TAIR ID
##    genome_version_in_txdb          orgdb primary_gene_id_in_orgdb
## 1                    hg18   org.Hs.eg.db           Entrez Gene ID
## 2                    hg19   org.Hs.eg.db           Entrez Gene ID
## 3                    hg38   org.Hs.eg.db           Entrez Gene ID
## 4                    hg38   org.Hs.eg.db           Entrez Gene ID
## 5                    mm10   org.Mm.eg.db           Entrez Gene ID
## 6                    mm10   org.Mm.eg.db           Entrez Gene ID
## 7                    mm39   org.Mm.eg.db           Entrez Gene ID
## 8                     mm9   org.Mm.eg.db           Entrez Gene ID
## 9                     rn4   org.Rn.eg.db           Entrez Gene ID
## 10                    rn5   org.Rn.eg.db           Entrez Gene ID
## 11                    rn6   org.Rn.eg.db           Entrez Gene ID
## 12                    rn7   org.Rn.eg.db           Entrez Gene ID
## 13                galGal4   org.Gg.eg.db           Entrez Gene ID
## 14                galGal5   org.Gg.eg.db           Entrez Gene ID
## 15                galGal6   org.Gg.eg.db           Entrez Gene ID
## 16               rheMac10  org.Mmu.eg.db           Entrez Gene ID
## 17                rheMac3  org.Mmu.eg.db           Entrez Gene ID
## 18                rheMac8  org.Mmu.eg.db           Entrez Gene ID
## 19                   ce11   org.Ce.eg.db           Entrez Gene ID
## 20                   ce11   org.Ce.eg.db           Entrez Gene ID
## 21                canFam3   org.Cf.eg.db           Entrez Gene ID
## 22                canFam4   org.Cf.eg.db           Entrez Gene ID
## 23                canFam5   org.Cf.eg.db           Entrez Gene ID
## 24               susScr11   org.Ss.eg.db           Entrez Gene ID
## 25                susScr3   org.Ss.eg.db           Entrez Gene ID
## 26                sacCer2  org.Sc.sgd.db              SGD Gene ID
## 27                sacCer3  org.Sc.sgd.db              SGD Gene ID
## 28                panTro4   org.Pt.eg.db           Entrez Gene ID
## 29                panTro5   org.Pt.eg.db           Entrez Gene ID
## 30                panTro6   org.Pt.eg.db           Entrez Gene ID
## 31                    dm3   org.Dm.eg.db           Entrez Gene ID
## 32                    dm6   org.Dm.eg.db           Entrez Gene ID
## 33               danRer10   org.Dr.eg.db           Entrez Gene ID
## 34               danRer11   org.Dr.eg.db           Entrez Gene ID
## 35                bosTau8   org.Bt.eg.db           Entrez Gene ID
## 36                bosTau9   org.Bt.eg.db           Entrez Gene ID
## 37                 araTha org.At.tair.db                  TAIR ID
## 38                 araTha org.At.tair.db                  TAIR ID
## 39                 araTha org.At.tair.db                  TAIR ID
## 40                 araTha org.At.tair.db                  TAIR ID
##    chromosome_prefix
## 1                chr
## 2                chr
## 3                chr
## 4                chr
## 5                chr
## 6                chr
## 7                chr
## 8                chr
## 9                chr
## 10               chr
## 11               chr
## 12               chr
## 13               chr
## 14               chr
## 15               chr
## 16               chr
## 17               chr
## 18               chr
## 19               chr
## 20               chr
## 21               chr
## 22               chr
## 23               chr
## 24               chr
## 25               chr
## 26               chr
## 27               chr
## 28               chr
## 29               chr
## 30               chr
## 31               chr
## 32               chr
## 33               chr
## 34               chr
## 35               chr
## 36               chr
## 37                  
## 38                  
## 39                  
## 40

MSigDB gene sets

The second category of gene sets are from MSigDB. Note this is only for human.

res_h = great(gr, "msigdb:h", "hg19")
res_h

## 1726 regions are associated to 2268 genes' extended TSSs.
##   TSS source: TxDb.Hsapiens.UCSC.hg19.knownGene
##   Genome: hg19
##   OrgDb: org.Hs.eg.db
##   Gene sets: msigdb:h
##   Background: whole genome excluding gaps
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

All supported MSigDB gene sets are:

"msigdb:H" Hallmark gene sets.
"msigdb:C1" Positional gene sets.
"msigdb:C2" Curated gene sets.
"msigdb:C2:CGP" C2 subcategory: chemical and genetic perturbations gene sets.
"msigdb:C2:CP" C2 subcategory: canonical pathways gene sets.
"msigdb:C2:CP:KEGG" C2 subcategory: KEGG subset of CP.
"msigdb:C2:CP:PID" C2 subcategory: PID subset of CP.
"msigdb:C2:CP:REACTOME" C2 subcategory: REACTOME subset of CP.
"msigdb:C2:CP:WIKIPATHWAYS" C2 subcategory: WIKIPATHWAYS subset of CP.
"msigdb:C3" Regulatory target gene sets.
"msigdb:C3:MIR:MIRDB" miRDB of microRNA targets gene sets.
"msigdb:C3:MIR:MIR_LEGACY" MIR_Legacy of MIRDB.
"msigdb:C3:TFT:GTRD" GTRD transcription factor targets gene sets.
"msigdb:C3:TFT:TFT_LEGACY" TFT_Legacy.
"msigdb:C4" Computational gene sets.
"msigdb:C4:CGN" C4 subcategory: cancer gene neighborhoods gene sets.
"msigdb:C4:CM" C4 subcategory: cancer modules gene sets.
"msigdb:C5" Ontology gene sets.
"msigdb:C5:GO:BP" C5 subcategory: BP subset.
"msigdb:C5:GO:CC" C5 subcategory: CC subset.
"msigdb:C5:GO:MF" C5 subcategory: MF subset.
"msigdb:C5:HPO" C5 subcategory: human phenotype ontology gene sets.
"msigdb:C6" Oncogenic signature gene sets.
"msigdb:C7" Immunologic signature gene sets.
"msigdb:C7:IMMUNESIGDB" ImmuneSigDB subset of C7.
"msigdb:C7:VAX" C7 subcategory: vaccine response gene sets.
"msigdb:C8" Cell type signature gene sets.

The prefix msigdb: can be omitted when specified in great() and the name of a MSigDB can be used as case insensitive.

great(gr, "h", "hg19")
great(gr, "c2:cp", "hg19")

Manually set gene sets

More gene sets can be set as a list of gene vectors. Note genes in the gene sets must be mostly in Entrez ID type. In the following example, we use a gene set collection from DSigDB. This collection (FDA approved kinase inhibitors) only contains 28 gene sets.

Here read_gmt() is a simple function that reads a gmt file as a list of vectors, also performs gene ID conversion.

gs = read_gmt(url("http://dsigdb.tanlab.org/Downloads/D2_LINCS.gmt"), 
    from = "SYMBOL", to = "ENTREZ", orgdb = "org.Hs.eg.db")
gs[1:2]

## $GSK429286A
##  [1] "5566"  "9475"  "81788" "6093"  "5562"  "26524" "5592"  "5979"  "640"  
## [10] "5567"  "3717"  "5613"  "23012" "695"   "3718" 
## 
## $`BS-181`
##  [1] "122011" "1452"   "1022"   "7272"   "1454"   "1453"   "65975"  "147746"
##  [9] "1859"   "1195"   "1196"   "9149"   "57396"  "5261"

great(gr, gs, "hg19")

## 1726 regions are associated to 2268 genes' extended TSSs.
##   TSS source: TxDb.Hsapiens.UCSC.hg19.knownGene
##   Genome: hg19
##   OrgDb: org.Hs.eg.db
##   Gene sets: self-provided
##   Background: whole genome excluding gaps
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Or using KEGG gene sets:

kegg = KEGGREST::keggLink("hsa", "pathway")
kegg_gs = split(kegg, names(kegg))
kegg_gs = lapply(kegg_gs, function(x) unname(gsub("hsa:", "", x)))
names(kegg_gs) = gsub("path:", "", names(kegg_gs))
kegg_gs[1:2]

## $hsa00010
##  [1] "10327"  "124"    "125"    "126"    "127"    "128"    "130"    "130589"
##  [9] "131"    "160287" "1737"   "1738"   "2023"   "2026"   "2027"   "217"   
## [17] "218"    "219"    "2203"   "221"    "222"    "223"    "224"    "226"   
## [25] "229"    "230"    "2538"   "2597"   "26330"  "2645"   "2821"   "3098"  
## [33] "3099"   "3101"   "387712" "3939"   "3945"   "3948"   "441531" "501"   
## [41] "5105"   "5106"   "5160"   "5161"   "5162"   "5211"   "5213"   "5214"  
## [49] "5223"   "5224"   "5230"   "5232"   "5236"   "5313"   "5315"   "55276" 
## [57] "55902"  "57818"  "669"    "7167"   "80201"  "83440"  "84532"  "8789"  
## [65] "92483"  "92579"  "9562"  
## 
## $hsa00020
##  [1] "1431"  "1737"  "1738"  "1743"  "2271"  "3417"  "3418"  "3419"  "3420" 
## [10] "3421"  "4190"  "4191"  "47"    "48"    "4967"  "50"    "5091"  "5105" 
## [19] "5106"  "5160"  "5161"  "5162"  "55753" "6389"  "6390"  "6391"  "6392" 
## [28] "8801"  "8802"  "8803"

great(gr, kegg_gs, "hg19")

## 1726 regions are associated to 2268 genes' extended TSSs.
##   TSS source: TxDb.Hsapiens.UCSC.hg19.knownGene
##   Genome: hg19
##   OrgDb: org.Hs.eg.db
##   Gene sets: self-provided
##   Background: whole genome excluding gaps
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Manually set gene sets and TSS

Mainly for non-model/less well studied organisms.

Two inputs:

Gene sets: A list of gene vectors.
TSS/gene: A GRanges object with a gene_id column. Gene IDs in gene_id column must match to genes in the gene sets.

Extending TSS needs the chromosome lengths.

with gene as input without additional chromosome length information

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
gene = genes(TxDb.Hsapiens.UCSC.hg19.knownGene)

##   403 genes were dropped because they have exons located on both strands
##   of the same reference sequence or on more than one reference sequence,
##   so cannot be represented by a single genomic range.
##   Use 'single.strand.genes.only=FALSE' to get all the genes in a
##   GRangesList object, or use suppressMessages() to suppress this message.

gene = gene[seqnames(gene) %in% paste0("chr", c(1:22, "X", "Y"))]
gene = sort(gene)
# remove seqlength informatio
gene = GRanges(seqnames = as.vector(seqnames(gene)), ranges = ranges(gene),
    strand = strand(gene), gene_id = gene$gene_id)

great(gr, gs, gene)

## ! You havn't specified neither `seqlengths` nor `genome`, take the
## maximal ends of genes on chromosomes as the chromosome lengths.

## * TSS extension mode is 'basalPlusExt'.

## * construct the basal domains by extending 5000bp to upstream and 1000bp to downsteram of TSS.

## * calculate distances to neighbour regions.

## * extend to both sides until reaching the neighbour genes or to the maximal extension.

## * check gene ID type in `gene_sets` and in `extended_tss`.

## * use whole genome as background.

## * remove excluded regions from background.

## * overlap `gr` to background regions (based on midpoint).

## * in total 1726 `gr`.

## * overlap extended TSS to background regions.

## * check which genes are in the gene sets.

## * only take gene sets with size >= 5.

## * in total 84 gene sets.

## * overlap `gr` to every extended TSS.

## * perform binomial test for each biological term.

## 1726 regions are associated to 2395 genes' extended TSSs.
##   TSS source: self-provided
##   Genome: unknown
##   Gene sets: self-provided
##   Background: whole genome
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

In this case, the end position of the last gene on each chromosome is treated as the end of the chromosome.

with gene where seqlengths() is already set:

seqlengths(gene) = c(chr1 = 249250621, chr2 = 243199373, chr3 = 198022430, chr4 = 191154276,
    chr5 = 180915260, chr6 = 171115067, chr7 = 159138663, chr8 = 146364022,
    chr9 = 141213431, chr10 = 135534747, chr11 = 135006516, chr12 = 133851895,
    chr13 = 115169878, chr14 = 107349540, chr15 = 102531392, chr16 = 90354753,
    chr17 = 81195210, chr18 = 78077248, chr19 = 59128983, chr20 = 63025520,
    chr21 = 48129895, chr22 = 51304566, chrX = 155270560, chrY = 59373566)
great(gr, gs, gene)

## 1726 regions are associated to 2395 genes' extended TSSs.
##   TSS source: self-provided
##   Genome: unknown
##   Gene sets: self-provided
##   Background: whole genome
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Set background regions

There are two ways to set “background/mask” regions: exclude and background.

It is very common that gap regions in the genome are removed from the analysis. In the following example, getGapFromUCSC() can be used to retrieve gap regions from UCSC table browser.

gap = getGapFromUCSC("hg19", paste0("chr", c(1:22, "X", "Y")))
great(gr, "MSigDB:H", "hg19", exclude = gap)

Or only in a set of pre-defined background regions, e.g. regionts with similar CpG densities as input regions.

background_regions = ... # from upstream analysis, a GRanges object
great(gr, "MSigDB:H", "hg19", background = background_regions)

Note as the same as online GREAT, rGREAT by default excludes gap regions from the analysis.

Alternatively, background and exclude can also be set to a vector of chromosome names, then the whole selected chromosomes will be included/excluded from the analysis.

great(gr, "GO:BP", "hg19", background = paste0("chr", 1:22))
great(gr, "GO:BP", "hg19", exclude = c("chrX", "chrY"))

Get enrichment table

Simply use getEnrichmentTable() function.

tb = getEnrichmentTable(res_bp)
head(tb)

##           id                         description genome_fraction
## 1 GO:0097190         apoptotic signaling pathway      0.05113446
## 2 GO:0042981     regulation of apoptotic process      0.13585729
## 3 GO:0043067 regulation of programmed cell death      0.13812704
## 4 GO:0010941            regulation of cell death      0.15338546
## 5 GO:0006915                   apoptotic process      0.17420944
## 6 GO:0012501               programmed cell death      0.17749351
##   observed_region_hits fold_enrichment p_value p_adjust mean_tss_dist
## 1                  176        1.994152       0        0        110109
## 2                  384        1.637599       0        0        125275
## 3                  385        1.614884       0        0        126229
## 4                  413        1.560002       0        0        123382
## 5                  462        1.536489       0        0        123653
## 6                  468        1.527646       0        0        124195
##   observed_gene_hits gene_set_size fold_enrichment_hyper p_value_hyper
## 1                110           579              1.554960  1.289313e-06
## 2                271          1393              1.592294  5.551115e-16
## 3                273          1421              1.572439  2.220446e-15
## 4                295          1571              1.536919  3.108624e-15
## 5                334          1824              1.498742  1.332268e-15
## 6                341          1876              1.487739  1.998401e-15
##   p_adjust_hyper
## 1   2.898304e-05
## 2   1.366407e-13
## 3   4.204329e-13
## 4   5.668059e-13
## 5   2.981252e-13
## 6   3.935252e-13

There are also columns for hypergeometric test on the numbers of genes, the same as in the original GREAT method.

There is a new column “mean_tss_dist” in the result table which is the mean absolute distance of input regions to TSS of genes in a gene set. Please note with larger distance to TSS, the more we need to be careful with the reliability of the associations between input regions to genes.

Make volcano plot

In differential gene expression analysis, volcano plot is used to visualize relations between log2 fold change and (adjusted) p-values. Similarly, we can also use volcano plot to visualize relations between fold enrichment and (adjusted) p-values for the enrichment analysis. The plot is made by the function plotVolcano():

plotVolcano(res_bp)

As the enrichment analysis basically only looks for over-representations, it is actually half volcano.

Get region-gene associations

plotRegionGeneAssociations() generates three plots similar as those by online GREAT. getRegionGeneAssociations() returns a GRanges object containing associations between regions and genes.

plotRegionGeneAssociations(res_bp)

getRegionGeneAssociations(res_bp)

## GRanges object with 1717 ranges and 2 metadata columns:
##          seqnames            ranges strand | annotated_genes     dist_to_TSS
##             <Rle>         <IRanges>  <Rle> | <CharacterList>   <IntegerList>
##      [1]     chr1     936221-936347      * |            HES4            -669
##      [2]     chr1   1280252-1280492      * |     TAS1R3,DVL1      13526,4000
##      [3]     chr1   2143913-2144153      * |          FAAP20           -4741
##      [4]     chr1   4052186-4052426      * |  C1orf174,AJAP1 -235329,-662679
##      [5]     chr1   4440698-4440938      * |  C1orf174,AJAP1 -623841,-274167
##      ...      ...               ...    ... .             ...             ...
##   [1713]    chr22 39096963-39097203      * |    JOSD1,GTPBP1      -504,-4604
##   [1714]    chr22 39704855-39705095      * |      PDGFB,RPL3    -63898,11296
##   [1715]    chr22 40894508-40894748      * |     SGSM3,MRTFA   127913,137942
##   [1716]    chr22 43011250-43011490      * |         POLDIP3            -288
##   [1717]    chr22 50364455-50364695      * |    PIM3,IL17REL     10312,86360
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

Being different from the online GREAT, getRegionGeneAssociations() on the local GREAT object calculates the distance to TSS based on the borders of the input regions. The argument by_middle_points can be set to TRUE to let the distance be based on the middle points of input regions.

Please note the two meta columns are in formats of CharacterList and IntegerList because a region may associate to multiple genes.

Of course you can set a specific geneset term by argument term_id.

plotRegionGeneAssociations(res_bp, term_id = "GO:0006915")

getRegionGeneAssociations(res_bp, term_id = "GO:0006915")

## GRanges object with 463 ranges and 2 metadata columns:
##         seqnames            ranges strand | annotated_genes    dist_to_TSS
##            <Rle>         <IRanges>  <Rle> | <CharacterList>  <IntegerList>
##     [1]     chr1 10440739-10440979      * |           KIF1B         169975
##     [2]     chr1 16472855-16473095      * |           EPHA2           9487
##     [3]     chr1 16490874-16491114      * |           EPHA2          -8292
##     [4]     chr1 23073949-23074189      * |           KDM1A        -271752
##     [5]     chr1 23912118-23912358      * |       ID3,RPL11 -25833,-105911
##     ...      ...               ...    ... .             ...            ...
##   [459]    chr20 62330537-62330777      * |        TNFRSF6B           3406
##   [460]    chr21 44897828-44898068      * |      SIK1,RRP1B -50826,-181364
##   [461]    chr22 21258958-21259198      * |            CRKL         -12516
##   [462]    chr22 37921455-37921695      * |          CARD10          -6077
##   [463]    chr22 50364455-50364695      * |            PIM3          10312
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

With local GREAT, it is possible to obtain associations between genes and all regions.

getRegionGeneAssociations(res_bp)

## GRanges object with 1717 ranges and 2 metadata columns:
##          seqnames            ranges strand | annotated_genes     dist_to_TSS
##             <Rle>         <IRanges>  <Rle> | <CharacterList>   <IntegerList>
##      [1]     chr1     936221-936347      * |            HES4            -669
##      [2]     chr1   1280252-1280492      * |     TAS1R3,DVL1      13526,4000
##      [3]     chr1   2143913-2144153      * |          FAAP20           -4741
##      [4]     chr1   4052186-4052426      * |  C1orf174,AJAP1 -235329,-662679
##      [5]     chr1   4440698-4440938      * |  C1orf174,AJAP1 -623841,-274167
##      ...      ...               ...    ... .             ...             ...
##   [1713]    chr22 39096963-39097203      * |    JOSD1,GTPBP1      -504,-4604
##   [1714]    chr22 39704855-39705095      * |      PDGFB,RPL3    -63898,11296
##   [1715]    chr22 40894508-40894748      * |     SGSM3,MRTFA   127913,137942
##   [1716]    chr22 43011250-43011490      * |         POLDIP3            -288
##   [1717]    chr22 50364455-50364695      * |    PIM3,IL17REL     10312,86360
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

The Shiny application

shinyReport() creates a Shiny application to view the complete results:

shinyReport(res_bp)

Work with other organisms

Use BioMart genes and gene sets

rGREAT also supports GO gene sets for a huge number of organisms retrieved from Ensembl BioMart. A specific organism can be set with the biomart_dataset argument:

# Giant panda
gr_panda = randomRegionsFromBioMartGenome("amelanoleuca_gene_ensembl")
gr_panda

## GRanges object with 993 ranges and 0 metadata columns:
##               seqnames            ranges strand
##                  <Rle>         <IRanges>  <Rle>
##     [1]              9     316703-326582      *
##     [2]              9   4025898-4029990      *
##     [3]              9   7512107-7513962      *
##     [4]              9   8093520-8102042      *
##     [5]              9   8840932-8850802      *
##     ...            ...               ...    ...
##   [989]             19 30864977-30865979      *
##   [990]             19 32737775-32747467      *
##   [991]             19 33398878-33402107      *
##   [992] LNAT02007500.1     392610-399562      *
##   [993] LNAT02006054.1   1549328-1558943      *
##   -------
##   seqinfo: 2040 sequences from an unspecified genome; no seqlengths

great(gr_panda, "GO:BP", biomart_dataset = "amelanoleuca_gene_ensembl")

## 960 regions are associated to 1491 genes' extended TSSs.
##   TSS source: amelanoleuca_gene_ensembl
##   Genome: Giant panda genes (ASM200744v2)
##   Gene sets: GO:BP
##   Background: whole genome
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Note both TSS and gene sets are from BioMart. The value for gene sets (the second argument) can only be one of "GO:BP", "GO:CC" and "GO:MF".

rGREAT now supports more than 600 organisms. The complete list can be found with BioMartGOGeneSets::supportedOrganisms(). Be careful with the genome version and format of chromsome names.

For non-model organisms, the default chromosomes may include many small contigs. You can set the background argument to a vector of chromosomes that you want to include.

Use MSigDB gene sets

MSigDB contains gene sets only for human, but it can be extended to other organisms by mapping to the homologues genes. The package msigdbr has already mapped genes to many other organisms. A full list of supported organisms in msigdbr can be obtained by：

library(msigdbr)
msigdbr_species()

## # A tibble: 20 × 2
##    species_name                    species_common_name                          
##    <chr>                           <chr>                                        
##  1 Anolis carolinensis             Carolina anole, green anole                  
##  2 Bos taurus                      bovine, cattle, cow, dairy cow, domestic cat…
##  3 Caenorhabditis elegans          <NA>                                         
##  4 Canis lupus familiaris          dog, dogs                                    
##  5 Danio rerio                     leopard danio, zebra danio, zebra fish, zebr…
##  6 Drosophila melanogaster         fruit fly                                    
##  7 Equus caballus                  domestic horse, equine, horse                
##  8 Felis catus                     cat, cats, domestic cat                      
##  9 Gallus gallus                   bantam, chicken, chickens, Gallus domesticus 
## 10 Homo sapiens                    human                                        
## 11 Macaca mulatta                  rhesus macaque, rhesus macaques, Rhesus monk…
## 12 Monodelphis domestica           gray short-tailed opossum                    
## 13 Mus musculus                    house mouse, mouse                           
## 14 Ornithorhynchus anatinus        duck-billed platypus, duckbill platypus, pla…
## 15 Pan troglodytes                 chimpanzee                                   
## 16 Rattus norvegicus               brown rat, Norway rat, rat, rats             
## 17 Saccharomyces cerevisiae        baker's yeast, brewer's yeast, S. cerevisiae 
## 18 Schizosaccharomyces pombe 972h- <NA>                                         
## 19 Sus scrofa                      pig, pigs, swine, wild boar                  
## 20 Xenopus tropicalis              tropical clawed frog, western clawed frog

To obtain gene sets for non-human organisms, e.g.:

h_gene_sets = msigdbr(species = "chimpanzee", category = "H")
head(h_gene_sets)

## # A tibble: 6 × 18
##   gs_cat gs_subcat gs_name               gene_symbol entrez_gene ensembl_gene   
##   <chr>  <chr>     <chr>                 <chr>             <int> <chr>          
## 1 H      ""        HALLMARK_ADIPOGENESIS ABCA1            464630 ENSPTRG0000002…
## 2 H      ""        HALLMARK_ADIPOGENESIS ABCB8            463892 ENSPTRG0000001…
## 3 H      ""        HALLMARK_ADIPOGENESIS ACAA2            455414 ENSPTRG0000001…
## 4 H      ""        HALLMARK_ADIPOGENESIS ACADL            459914 ENSPTRG0000001…
## 5 H      ""        HALLMARK_ADIPOGENESIS ACADM            469356 ENSPTRG0000000…
## 6 H      ""        HALLMARK_ADIPOGENESIS ACADS            742921 ENSPTRG0000000…
## # ℹ 12 more variables: human_gene_symbol <chr>, human_entrez_gene <int>,
## #   human_ensembl_gene <chr>, gs_id <chr>, gs_pmid <chr>, gs_geoid <chr>,
## #   gs_exact_source <chr>, gs_url <chr>, gs_description <chr>, taxon_id <int>,
## #   ortholog_sources <chr>, num_ortholog_sources <dbl>

If the organism you selected has a corresponding TxDb package available which provides TSS information, you need to make sure the gene sets use Entrez gene ID as the identifier (Most TxDb packages use Entrez ID as primary ID, you can check the variable rGREAT:::BIOC_ANNO_PKGS).

# convert to a list of gene sets
h_gene_sets = split(h_gene_sets$entrez_gene, h_gene_sets$gs_name)
h_gene_sets = lapply(h_gene_sets, as.character)  # just to make sure gene IDs are all in character.
h_gene_sets[1:2]

## $HALLMARK_ADIPOGENESIS
##   [1] "464630"    "463892"    "455414"    "459914"    "469356"    "742921"   
##   [7] "454672"    "104003784" "454895"    "451866"    "737339"    "471032"   
##  [13] "451742"    "737305"    "100615914" "456723"    "107967644" "454362"   
##  [19] "464334"    "743667"    "741867"    "449586"    "100614256" "741708"   
##  [25] "459164"    "746692"    "473976"    "452433"    "468889"    "745443"   
##  [31] "460926"    "455644"    "451116"    "454684"    "744890"    "461229"   
##  [37] "740513"    "104005232" "463949"    "469319"    "748673"    "450673"   
##  [43] "468605"    "471455"    "456837"    "464611"    "452659"    "472079"   
##  [49] "452307"    "454118"    "100616508" "465727"    "742828"    "737945"   
##  [55] "107976794" "107976794" "746229"    "472893"    "456557"    "457056"   
##  [61] "747265"    "736777"    "464460"    "451393"    "745691"    "454512"   
##  [67] "466780"    "463861"    "744984"    "452566"    "457117"    NA         
##  [73] "747936"    "459360"    "461436"    "464353"    "464074"    "466651"   
##  [79] "451984"    "456243"    "464255"    "467738"    "466732"    "461244"   
##  [85] "456929"    "460520"    "450562"    "450738"    "464140"    "459670"   
##  [91] "452976"    "471703"    "741876"    "471135"    "461424"    "459828"   
##  [97] "452295"    "460113"    "453565"    "741179"    "747276"    "470423"   
## [103] "451967"    "450290"    "473975"    "473975"    "460157"    "462946"   
## [109] "449638"    "738797"    "456076"    "451807"    "464031"    "739986"   
## [115] "459173"    "460872"    "463484"    "462853"    "739167"    "457477"   
## [121] "742027"    "746245"    "472764"    "747387"    "744096"    "101057233"
## [127] "744811"    "463686"    "744435"    "468748"    "451175"    "460227"   
## [133] "454744"    "739996"    NA          "450735"    "454478"    "457929"   
## [139] "738397"    "458602"    "456908"    "451591"    "450310"    "107970333"
## [145] "465012"    "463481"    "463481"    "460178"    "470365"    "742092"   
## [151] "741184"    "459094"    "459374"    "456940"    "745779"    "454531"   
## [157] "737918"    "107973114" "742100"    "470420"    "468499"    "467657"   
## [163] "100608935" "462416"    "451281"    "470281"    "470281"    "452359"   
## [169] "456862"    "456526"    "747462"    "474051"    "456155"    "458647"   
## [175] "744390"    "455841"    "459096"    "459031"    "450574"    "449637"   
## [181] "450628"    "470477"    "471247"    "453405"    "739128"    "454681"   
## [187] "464707"    "470417"    "450933"    "459685"    "460443"    "468406"   
## [193] "458803"    "467151"    "464550"    "745004"    "451416"    "735808"   
## [199] "743144"    "460348"    "107974864" "471631"    "741897"    "463489"   
## 
## $HALLMARK_ALLOGRAFT_REJECTION
##   [1] "454210"    "461523"    "450363"    "100609296" "459646"    "740898"   
##   [7] "466415"    "450170"    "465345"    "456984"    "744209"    "449497"   
##  [13] "100608992" "459361"    "741390"    "468208"    "748142"    "473220"   
##  [19] "748205"    "736543"    "454593"    "747004"    "454579"    "747123"   
##  [25] "462689"    "460323"    "740071"    "450128"    "469524"    "449512"   
##  [31] "748272"    "470617"    "451584"    "742330"    "451585"    "450124"   
##  [37] "100615583" "473802"    "460569"    "745293"    "462191"    "740560"   
##  [43] "470892"    "470900"    "735755"    "470426"    "460577"    "465021"   
##  [49] "465607"    "736196"    "457127"    "453745"    "457277"    "738275"   
##  [55] "471200"    NA          "459634"    "457770"    "469142"    "463415"   
##  [61] "466216"    "458797"    "453714"    "469204"    "750603"    NA         
##  [67] "100615835" "740028"    "451695"    "451158"    "471510"    "738331"   
##  [73] "469584"    "739516"    "465940"    "461906"    "468521"    "457003"   
##  [79] "472959"    "457020"    "467610"    "461873"    "452825"    "460623"   
##  [85] "463280"    "746195"    "750725"    "100608816" "449592"    "471979"   
##  [91] "471977"    "471974"    "462591"    "462540"    "494187"    "450196"   
##  [97] "474132"    "473965"    "449517"    "747276"    "470077"    "743102"   
## [103] "472749"    "469657"    "736204"    "460816"    "471723"    "455851"   
## [109] "449564"    "737808"    "449644"    "739011"    "744277"    "450200"   
## [115] "461472"    "456370"    "450884"    "470203"    "101059843" "449565"   
## [121] "454005"    "463288"    "464245"    "745517"    "463371"    "470524"   
## [127] "462386"    "450927"    "454294"    "454045"    "458607"    "735556"   
## [133] "464979"    "450156"    "738375"    "456715"    "471734"    "736309"   
## [139] "744486"    "459682"    "745667"    "472771"    "462888"    "748652"   
## [145] "449582"    "458294"    "460699"    "459239"    "741196"    "460720"   
## [151] "100322885" "469743"    "455026"    "740704"    "740477"    "450512"   
## [157] "453993"    "456276"    "743176"    "748032"    "457607"    "462249"   
## [163] "464277"    "737451"    "746600"    "737526"    "456065"    "461536"   
## [169] "107966305" "746721"    "737070"    "459209"    "451169"    "450503"   
## [175] "461971"    "461023"    "459834"    "100610925" "471978"    "746399"   
## [181] "746814"    "456060"    "457742"    "451611"    "107971092" "461167"   
## [187] "471325"    "471374"    "471167"    "494186"    "748737"    "464876"   
## [193] "741922"    "745141"    "452125"    "453161"    "743187"    "459427"

Now we can perform the local GREAT analysis.

gr = randomRegions(genome = "panTro6")
great(gr, h_gene_sets, "panTro6")

## 990 regions are associated to 659 genes' extended TSSs.
##   TSS source: TxDb.Ptroglodytes.UCSC.panTro6.refGene
##   Genome: panTro6
##   OrgDb: org.Pt.eg.db
##   Gene sets: self-provided
##   Background: whole genome excluding gaps
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Use NCBI genomes

NCBI provides genome data for a huge number of organisms. It is quite easy to obtain the GO gene sets for a specific organism with the AnnotationHub package where NCBI is also the data source.

Taking dolphin (Tursiops truncatus) as an example.

library(AnnotationHub)
ah = AnnotationHub()

display(ah)

# id for the AnnotationHub dataset
org_db = ah[["AH108106"]]
org_db

## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | DBSCHEMA: NOSCHEMA_DB
## | ORGANISM: Tursiops truncatus
## | SPECIES: Tursiops truncatus
## | CENTRALID: GID
## | Taxonomy ID: 9739
## | Db type: OrgDb
## | Supporting package: AnnotationDbi

all_bp_ids = mapIds(org_db, keytype = "ONTOLOGY", keys = "BP", 
    column = "GO", multiVals = "list")[[1]]
head(all_bp_ids)

## [1] "GO:0006119" "GO:0015986" "GO:0042776" "GO:0019646" "GO:0008535"
## [6] "GO:0006120"

We select "GOALL" so that gene set for each GO term includes genes from all its offspring terms.

bp_genesets = mapIds(org_db, keytype = "GOALL", keys = all_bp_ids, 
    column = "ENTREZID", multiVals = "list")
bp_genesets[1:2]

## $`GO:0006119`
##  [1] "7412047"   "7412050"   "7412051"   "7412052"   "7412053"   "7412054"  
##  [7] "7412055"   "7412056"   "7412058"   "7412059"   "101315513" "101316029"
## [13] "101316487" "101316542" "101316570" "101317856" "101318351" "101318829"
## [19] "101318867" "101319016" "101319704" "101320192" "101320707" "101322629"
## [25] "101323037" "101323468" "101325919" "101326733" "101326830" "101327630"
## [31] "101329052" "101331837" "101332916" "101333806" "101334086" "101334550"
## [37] "101335214" "101335403" "101336771" "101337012" "101337690" "101337707"
## [43] "101339027" "101339993" "109547368" "109549348" "109549675"
## 
## $`GO:0015986`
##  [1] "7412049"   "7412050"   "101315513" "101316487" "101318351" "101319550"
##  [7] "101320149" "101321264" "101326149" "101331575" "101332310" "101334550"
## [13] "101336294"

Unfortunately, there is no processed data of genome and genes for many other organisms, but these data is available from NCBI. The function getGenomeDataFromNCBI() accepts a RefSeq accession number for a given genome assembly and returns information of the genome as well as the genes.

The RefSeq accession number can be obtained from https://www.ncbi.nlm.nih.gov/data-hub/genome/. The accession number for dolphin is “GCF_011762595.1” and its genome page is at https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_011762595.1/ (always be careful with the genome/assembly version). We can get its genes with the command:

genes = getGenomeDataFromNCBI("GCF_011762595.1", return_granges = TRUE)
genes

## GRanges object with 19081 ranges and 1 metadata column:
##           seqnames        ranges strand |     gene_id
##              <Rle>     <IRanges>  <Rle> | <character>
##       [1]        1 234018-239154      + |   101335869
##       [2]        1 241768-255021      + |   101336161
##       [3]        1 259970-287776      + |   101336448
##       [4]        1 288745-298452      + |   101336738
##       [5]        1 325391-327082      - |   101321052
##       ...      ...           ...    ... .         ...
##   [19077]       MT    9921-10217      + |     7412053
##   [19078]       MT   10211-11588      + |     7412054
##   [19079]       MT   11790-13610      + |     7412055
##   [19080]       MT   13594-14121      - |     7412056
##   [19081]       MT   14195-15334      + |     7412057
##   -------
##   seqinfo: 23 sequences from an unspecified genome

The good thing is dolphin genome is already assembled on the chromosome level, so the chromosome length information is already included in genes:

seqlengths(genes)

##         1         2         3         4         5         6         7         8 
## 183742880 176472748 171421769 144183200 137487549 115599616 114015307 108430135 
##         9        10        11        12        13        14        15        16 
## 105335178 102812509 102167733  89330967  88485069  88483154  86293990  83508544 
##        17        18        19        20        21         X        MT 
##  78442772  78151963  58636127  58467291  35682248 128693445     16388

And we can directly use genes in great().

gr = randomRegions(seqlengths = seqlengths(genes))
gr

## GRanges object with 1000 ranges and 0 metadata columns:
##          seqnames              ranges strand
##             <Rle>           <IRanges>  <Rle>
##      [1]        1     5026741-5030722      *
##      [2]        1     5244108-5245358      *
##      [3]        1     8202579-8204950      *
##      [4]        1     9039213-9047046      *
##      [5]        1   10218811-10220154      *
##      ...      ...                 ...    ...
##    [996]        X 120273682-120276320      *
##    [997]        X 124498419-124508365      *
##    [998]        X 125121496-125125278      *
##    [999]        X 125555571-125562044      *
##   [1000]        X 127437523-127441364      *
##   -------
##   seqinfo: 23 sequences from an unspecified genome; no seqlengths

great(gr, bp_genesets, genes)

## 1000 regions are associated to 1574 genes' extended TSSs.
##   TSS source: self-provided
##   Genome: unknown
##   Gene sets: self-provided
##   Background: whole genome
## Mode: Basal plus extension
##   Proximal: 5000 bp upstream, 1000 bp downstream,
##   plus Distal: up to 1000000 bp

Session info

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/UTF-8/C/C/C/C
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] AnnotationHub_3.6.0                    
##  [2] BiocFileCache_2.6.1                    
##  [3] dbplyr_2.3.3                           
##  [4] msigdbr_7.5.1                          
##  [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
##  [6] GenomicFeatures_1.50.4                 
##  [7] AnnotationDbi_1.60.2                   
##  [8] Biobase_2.58.0                         
##  [9] rGREAT_2.3.1                           
## [10] GenomicRanges_1.50.2                   
## [11] GenomeInfoDb_1.34.9                    
## [12] IRanges_2.32.0                         
## [13] S4Vectors_0.36.2                       
## [14] BiocGenerics_0.44.0                    
## [15] knitr_1.43                             
## [16] rmarkdown_2.23                         
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-7                                 
##   [2] matrixStats_1.0.0                            
##   [3] bit64_4.0.5                                  
##   [4] filelock_1.0.2                               
##   [5] doParallel_1.0.17                            
##   [6] RColorBrewer_1.1-3                           
##   [7] progress_1.2.2                               
##   [8] httr_1.4.6                                   
##   [9] tools_4.2.0                                  
##  [10] bslib_0.5.0                                  
##  [11] utf8_1.2.3                                   
##  [12] R6_2.5.1                                     
##  [13] DT_0.28                                      
##  [14] DBI_1.1.3                                    
##  [15] colorspace_2.1-0                             
##  [16] GetoptLong_1.1.0                             
##  [17] withr_2.5.0                                  
##  [18] tidyselect_1.2.0                             
##  [19] prettyunits_1.1.1                            
##  [20] bit_4.0.5                                    
##  [21] curl_5.0.1                                   
##  [22] compiler_4.2.0                               
##  [23] cli_3.6.1                                    
##  [24] xml2_1.3.5                                   
##  [25] DelayedArray_0.24.0                          
##  [26] rtracklayer_1.58.0                           
##  [27] sass_0.4.7                                   
##  [28] rappdirs_0.3.3                               
##  [29] stringr_1.5.0                                
##  [30] digest_0.6.33                                
##  [31] Rsamtools_2.14.0                             
##  [32] XVector_0.38.0                               
##  [33] pkgconfig_2.0.3                              
##  [34] htmltools_0.5.5                              
##  [35] MatrixGenerics_1.10.0                        
##  [36] highr_0.10                                   
##  [37] fastmap_1.1.1                                
##  [38] htmlwidgets_1.6.2                            
##  [39] rlang_1.1.1                                  
##  [40] GlobalOptions_0.1.2                          
##  [41] RSQLite_2.3.1                                
##  [42] shiny_1.7.4.1                                
##  [43] prettydoc_0.4.1                              
##  [44] shape_1.4.6                                  
##  [45] jquerylib_0.1.4                              
##  [46] BiocIO_1.8.0                                 
##  [47] generics_0.1.3                               
##  [48] jsonlite_1.8.7                               
##  [49] BiocParallel_1.32.6                          
##  [50] BioMartGOGeneSets_0.99.9                     
##  [51] dplyr_1.1.2                                  
##  [52] RCurl_1.98-1.12                              
##  [53] magrittr_2.0.3                               
##  [54] GO.db_3.16.0                                 
##  [55] GenomeInfoDbData_1.2.9                       
##  [56] Matrix_1.6-0                                 
##  [57] Rcpp_1.0.11                                  
##  [58] fansi_1.0.4                                  
##  [59] babelgene_22.9                               
##  [60] lifecycle_1.0.3                              
##  [61] org.Pt.eg.db_3.16.0                          
##  [62] TxDb.Ptroglodytes.UCSC.panTro6.refGene_3.10.0
##  [63] stringi_1.7.12                               
##  [64] yaml_2.3.7                                   
##  [65] SummarizedExperiment_1.28.0                  
##  [66] zlibbioc_1.44.0                              
##  [67] org.Hs.eg.db_3.16.0                          
##  [68] grid_4.2.0                                   
##  [69] blob_1.2.4                                   
##  [70] promises_1.2.0.1                             
##  [71] parallel_4.2.0                               
##  [72] crayon_1.5.2                                 
##  [73] lattice_0.21-8                               
##  [74] Biostrings_2.66.0                            
##  [75] circlize_0.4.16                              
##  [76] hms_1.1.3                                    
##  [77] KEGGREST_1.38.0                              
##  [78] pillar_1.9.0                                 
##  [79] rjson_0.2.21                                 
##  [80] codetools_0.2-19                             
##  [81] biomaRt_2.54.1                               
##  [82] BiocVersion_3.16.0                           
##  [83] XML_3.99-0.14                                
##  [84] glue_1.6.2                                   
##  [85] evaluate_0.21                                
##  [86] BiocManager_1.30.21                          
##  [87] httpuv_1.6.11                                
##  [88] png_0.1-8                                    
##  [89] vctrs_0.6.3                                  
##  [90] foreach_1.5.2                                
##  [91] purrr_1.0.1                                  
##  [92] cachem_1.0.8                                 
##  [93] xfun_0.39                                    
##  [94] mime_0.12                                    
##  [95] xtable_1.8-4                                 
##  [96] restfulr_0.0.15                              
##  [97] later_1.3.1                                  
##  [98] tibble_3.2.1                                 
##  [99] TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0     
## [100] iterators_1.0.14                             
## [101] GenomicAlignments_1.34.1                     
## [102] memoise_2.0.1                                
## [103] interactiveDisplayBase_1.36.0                
## [104] ellipsis_0.3.2