Online GREAT analysis

Repo: https://github.com/jokergoo/jokergoo.github.io/tree/master/talks

We first read the regions (peaks of the transcription factor JUN) as a data frame.

bed = read.table("Encode3_JUN_hg19.bed")
head(bed)

##     V1      V2      V3 V4   V5 V6
## 1 chr1  936221  936347  . 1000  .
## 2 chr1 1280252 1280492  .  480  .
## 3 chr1 2143913 2144153  .  982  .
## 4 chr1 4052186 4052426  .  544  .
## 5 chr1 4440698 4440938  .  837  .
## 6 chr1 7359438 7359678  .  493  .

dim(bed)

## [1] 1726    6

Use submitGreatJob() to submit the regions to the GREAT web server. By default the species is human (hg19). Current version of GREAT support "hg38", "hg19", "mm10", "mm9".

library(rGREAT)
job = submitGreatJob(bed)

The summary of the job.

job

## Submit time: 2023-08-03 10:06:20 
##   Note the results may only be avaiable on GREAT server for 24 hours.
## Version: 4.0.4 
## Species: hg19 
## Inputs: 1726 regions
## Mode: Basal plus extension 
##   Proximal: 5 kb upstream, 1 kb downstream,
##   plus Distal: up to 1000 kb
## Include curated regulatory domains
## 
## Enrichment tables for following ontologies have been downloaded:
##   None

Next step is to select one or more ontology categories and to retrieve the result tables. There are many ontologies integrated in GREAT. The total ontologies can be get by:

availableOntologies(job)

## [1] "GO Molecular Function"     "GO Biological Process"    
## [3] "GO Cellular Component"     "Mouse Phenotype"          
## [5] "Mouse Phenotype Single KO" "Human Phenotype"          
## [7] "Ensembl Genes"

By default three GO ontologies are used.

tbl = getEnrichmentTables(job)

## The default enrichment table does not contain informatin of associated
## genes for each input region. You can set `download_by = 'tsv'` to
## download the complete table, but note only the top 500 regions can be
## retreived. See the following link:
## 
## https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655401/Export#Export-GlobalExport
## 
## Except the additional gene-region association column if taking 'tsv' as
## the source of result, all other columns are the same if you choose
## 'json' (the default) as the source. Or you can try the local GREAT
## analysis with the function `great()`.

Once the result tables are retrieved, they are cached.

job

## Submit time: 2023-08-03 10:06:20 
##   Note the results may only be avaiable on GREAT server for 24 hours.
## Version: 4.0.4 
## Species: hg19 
## Inputs: 1726 regions
## Mode: Basal plus extension 
##   Proximal: 5 kb upstream, 1 kb downstream,
##   plus Distal: up to 1000 kb
## Include curated regulatory domains
## 
## Enrichment tables for following ontologies have been downloaded:
##   GO Biological Process
##   GO Cellular Component
##   GO Molecular Function

More ontologies can be set via the ontology argument.

getEnrichmentTable(job, ontology = c("GO Biological Process", "Mouse Phenotype"))

tbl is a list of three tables and each table contains enrichment results for a GO ontology:

names(tbl)

## [1] "GO Molecular Function" "GO Biological Process" "GO Cellular Component"

head(tbl[["GO Biological Process"]])

##           ID                                    name Binom_Genome_Fraction
## 1 GO:0009987                        cellular process             0.8819650
## 2 GO:0006950                      response to stress             0.2518874
## 3 GO:0042981         regulation of apoptotic process             0.1425860
## 4 GO:0010941                regulation of cell death             0.1563783
## 5 GO:0043067     regulation of programmed cell death             0.1441386
## 6 GO:0048523 negative regulation of cellular process             0.4096118
##   Binom_Expected Binom_Observed_Region_Hits Binom_Fold_Enrichment
## 1      1522.2720                       1647              1.081936
## 2       434.7576                        625              1.437583
## 3       246.1035                        403              1.637522
## 4       269.9089                        430              1.593130
## 5       248.7833                        403              1.619884
## 6       706.9899                        909              1.285733
##   Binom_Region_Set_Coverage Binom_Raw_PValue Binom_Adjp_BH Hyper_Total_Genes
## 1                 0.9542294     1.060174e-25  1.393599e-21             14666
## 2                 0.3621089     2.393433e-24  1.573084e-20              3265
## 3                 0.2334878     5.894798e-24  2.582904e-20              1472
## 4                 0.2491309     1.933936e-23  6.355397e-20              1601
## 5                 0.2334878     4.276777e-23  1.124365e-19              1486
## 6                 0.5266512     7.643382e-23  1.674538e-19              4412
##   Hyper_Expected Hyper_Observed_Gene_Hits Hyper_Fold_Enrichment
## 1      1793.2230                     1879              1.047834
## 2       399.2140                      479              1.199858
## 3       179.9825                      283              1.572375
## 4       195.7555                      305              1.558066
## 5       181.6943                      283              1.557561
## 6       539.4585                      723              1.340233
##   Hyper_Gene_Set_Coverage Hyper_Term_Gene_Coverage Hyper_Raw_PValue
## 1               0.8284832                0.1281195     8.269245e-07
## 2               0.2111993                0.1467075     2.378972e-06
## 3               0.1247795                0.1922554     6.292510e-16
## 4               0.1344797                0.1905059     1.404290e-16
## 5               0.1247795                0.1904441     2.211597e-15
## 6               0.3187831                0.1638713     3.775104e-21
##   Hyper_Adjp_BH
## 1  5.176154e-05
## 2  1.319476e-04
## 3  3.759775e-13
## 4  1.186378e-13
## 5  1.118132e-12
## 6  1.240594e-17

plotVolcano(job, ontology = "GO Biological Process")

The global region-gene association plots:

plotRegionGeneAssociationGraphs(job)

Or the association specifically for a specific term.

plotRegionGeneAssociations(job, ontology = "GO Molecular Function",
    term_id = "GO:0004984")

## The webpage for 'GO Molecular Function:GO:0004984' is available at:
##   http://great.stanford.edu/public-4.0.4/cgi-bin/showTermDetails.php?termId=GO:0004984&ontoName=GOMolecularFunction&ontoUiName=GO Molecular Function&sessionName=20230803-public-4.0.4-ETOSE4&species=hg19&foreName=filed6f372f354b0.gz&backName=&table=region
##  Note the web page might be deleted from GREAT web server because it is only for temporary use.

The table of the associations between genes and regions.

getRegionGeneAssociations(job, ontology = "GO Molecular Function",
    term_id = "GO:0004984")

## The webpage for 'GO Molecular Function:GO:0004984' is available at:
##   http://great.stanford.edu/public-4.0.4/cgi-bin/showTermDetails.php?termId=GO:0004984&ontoName=GOMolecularFunction&ontoUiName=GO Molecular Function&sessionName=20230803-public-4.0.4-ETOSE4&species=hg19&foreName=filed6f372f354b0.gz&backName=&table=region
##  Note the web page might be deleted from GREAT web server because it is only for temporary use.

## GRanges object with 6 ranges and 2 metadata columns:
##       seqnames              ranges strand | annotated_genes    dist_to_TSS
##          <Rle>           <IRanges>  <Rle> | <CharacterList>  <IntegerList>
##   [1]     chr9 125164745-125164985      * |           OR1J1          75370
##   [2]    chr11     5301878-5302118      * |          OR51B4          21228
##   [3]    chr11   56700437-56700677      * |    OR9G4,OR5AK2 -189270,-55790
##   [4]    chr14   22802493-22802733      * |           OR4E2         669316
##   [5]    chr15   22410075-22410315      * |           OR4N4          27813
##   [6]    chr17     2795933-2796173      * |           OR1D5         170848
##   -------
##   seqinfo: 5 sequences from an unspecified genome; no seqlengths

Note, you cannot get the association bewteen genes and all regions. It is not supported by the web GREAT.

There is a Shiny app that contains all the results.

shinyReport(job)

Bonus: If you use the GO category, you can use the simplifyEnrichment to summarize the results.

library(simplifyEnrichment)
tb = tbl[["GO Biological Process"]]
sig_go = tb$ID[tb$Binom_Adjp_BH < 0.01]
length(sig_go)

## [1] 853

simplifyGO(sig_go)

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/UTF-8/C/C/C/C
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] simplifyEnrichment_1.9.4 rGREAT_2.3.1             GenomicRanges_1.50.2    
## [4] GenomeInfoDb_1.34.9      IRanges_2.32.0           S4Vectors_0.36.2        
## [7] BiocGenerics_0.44.0      rmarkdown_2.23          
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-7                            
##   [2] matrixStats_1.0.0                       
##   [3] bit64_4.0.5                             
##   [4] filelock_1.0.2                          
##   [5] doParallel_1.0.17                       
##   [6] RColorBrewer_1.1-3                      
##   [7] progress_1.2.2                          
##   [8] httr_1.4.6                              
##   [9] tools_4.2.0                             
##  [10] bslib_0.5.0                             
##  [11] utf8_1.2.3                              
##  [12] R6_2.5.1                                
##  [13] DT_0.28                                 
##  [14] DBI_1.1.3                               
##  [15] colorspace_2.1-0                        
##  [16] GetoptLong_1.1.0                        
##  [17] tidyselect_1.2.0                        
##  [18] prettyunits_1.1.1                       
##  [19] proxyC_0.3.3                            
##  [20] bit_4.0.5                               
##  [21] curl_5.0.1                              
##  [22] compiler_4.2.0                          
##  [23] cli_3.6.1                               
##  [24] Biobase_2.58.0                          
##  [25] Cairo_1.6-0                             
##  [26] NLP_0.2-1                               
##  [27] xml2_1.3.5                              
##  [28] DelayedArray_0.24.0                     
##  [29] slam_0.1-50                             
##  [30] rtracklayer_1.58.0                      
##  [31] sass_0.4.7                              
##  [32] tm_0.7-11                               
##  [33] rappdirs_0.3.3                          
##  [34] stringr_1.5.0                           
##  [35] digest_0.6.33                           
##  [36] Rsamtools_2.14.0                        
##  [37] XVector_0.38.0                          
##  [38] pkgconfig_2.0.3                         
##  [39] htmltools_0.5.5                         
##  [40] MatrixGenerics_1.10.0                   
##  [41] highr_0.10                              
##  [42] dbplyr_2.3.3                            
##  [43] fastmap_1.1.1                           
##  [44] htmlwidgets_1.6.2                       
##  [45] rlang_1.1.1                             
##  [46] GlobalOptions_0.1.2                     
##  [47] RSQLite_2.3.1                           
##  [48] shiny_1.7.4.1                           
##  [49] prettydoc_0.4.1                         
##  [50] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
##  [51] shape_1.4.6                             
##  [52] jquerylib_0.1.4                         
##  [53] BiocIO_1.8.0                            
##  [54] generics_0.1.3                          
##  [55] jsonlite_1.8.7                          
##  [56] BiocParallel_1.32.6                     
##  [57] GOSemSim_2.24.0                         
##  [58] dplyr_1.1.2                             
##  [59] RCurl_1.98-1.12                         
##  [60] magrittr_2.0.3                          
##  [61] GO.db_3.16.0                            
##  [62] GenomeInfoDbData_1.2.9                  
##  [63] Matrix_1.6-0                            
##  [64] Rcpp_1.0.11                             
##  [65] fansi_1.0.4                             
##  [66] lifecycle_1.0.3                         
##  [67] stringi_1.7.12                          
##  [68] yaml_2.3.7                              
##  [69] SummarizedExperiment_1.28.0             
##  [70] zlibbioc_1.44.0                         
##  [71] org.Hs.eg.db_3.16.0                     
##  [72] BiocFileCache_2.6.1                     
##  [73] blob_1.2.4                              
##  [74] promises_1.2.0.1                        
##  [75] parallel_4.2.0                          
##  [76] crayon_1.5.2                            
##  [77] lattice_0.21-8                          
##  [78] Biostrings_2.66.0                       
##  [79] GenomicFeatures_1.50.4                  
##  [80] circlize_0.4.16                         
##  [81] hms_1.1.3                               
##  [82] KEGGREST_1.38.0                         
##  [83] magick_2.7.4                            
##  [84] ComplexHeatmap_2.15.4                   
##  [85] knitr_1.43                              
##  [86] pillar_1.9.0                            
##  [87] rjson_0.2.21                            
##  [88] codetools_0.2-19                        
##  [89] biomaRt_2.54.1                          
##  [90] XML_3.99-0.14                           
##  [91] glue_1.6.2                              
##  [92] evaluate_0.21                           
##  [93] RcppParallel_5.1.7                      
##  [94] httpuv_1.6.11                           
##  [95] png_0.1-8                               
##  [96] vctrs_0.6.3                             
##  [97] foreach_1.5.2                           
##  [98] clue_0.3-64                             
##  [99] cachem_1.0.8                            
## [100] xfun_0.39                               
## [101] mime_0.12                               
## [102] xtable_1.8-4                            
## [103] restfulr_0.0.15                         
## [104] later_1.3.1                             
## [105] tibble_3.2.1                            
## [106] TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
## [107] iterators_1.0.14                        
## [108] GenomicAlignments_1.34.1                
## [109] AnnotationDbi_1.60.2                    
## [110] memoise_2.0.1                           
## [111] cluster_2.1.4                           
## [112] ellipsis_0.3.2