Repo: https://github.com/jokergoo/jokergoo.github.io/tree/master/talks
We first read the regions (peaks of the transcription factor JUN) as a data frame.
= read.table("Encode3_JUN_hg19.bed")
bed head(bed)
## V1 V2 V3 V4 V5 V6
## 1 chr1 936221 936347 . 1000 .
## 2 chr1 1280252 1280492 . 480 .
## 3 chr1 2143913 2144153 . 982 .
## 4 chr1 4052186 4052426 . 544 .
## 5 chr1 4440698 4440938 . 837 .
## 6 chr1 7359438 7359678 . 493 .
dim(bed)
## [1] 1726 6
Use submitGreatJob()
to submit the regions to the GREAT web server. By default the species is human (hg19). Current version of GREAT support "hg38"
, "hg19"
, "mm10"
, "mm9"
.
library(rGREAT)
= submitGreatJob(bed) job
The summary of the job.
job
## Submit time: 2023-08-03 10:06:20
## Note the results may only be avaiable on GREAT server for 24 hours.
## Version: 4.0.4
## Species: hg19
## Inputs: 1726 regions
## Mode: Basal plus extension
## Proximal: 5 kb upstream, 1 kb downstream,
## plus Distal: up to 1000 kb
## Include curated regulatory domains
##
## Enrichment tables for following ontologies have been downloaded:
## None
Next step is to select one or more ontology categories and to retrieve the result tables. There are many ontologies integrated in GREAT. The total ontologies can be get by:
availableOntologies(job)
## [1] "GO Molecular Function" "GO Biological Process"
## [3] "GO Cellular Component" "Mouse Phenotype"
## [5] "Mouse Phenotype Single KO" "Human Phenotype"
## [7] "Ensembl Genes"
By default three GO ontologies are used.
= getEnrichmentTables(job) tbl
## The default enrichment table does not contain informatin of associated
## genes for each input region. You can set `download_by = 'tsv'` to
## download the complete table, but note only the top 500 regions can be
## retreived. See the following link:
##
## https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655401/Export#Export-GlobalExport
##
## Except the additional gene-region association column if taking 'tsv' as
## the source of result, all other columns are the same if you choose
## 'json' (the default) as the source. Or you can try the local GREAT
## analysis with the function `great()`.
Once the result tables are retrieved, they are cached.
job
## Submit time: 2023-08-03 10:06:20
## Note the results may only be avaiable on GREAT server for 24 hours.
## Version: 4.0.4
## Species: hg19
## Inputs: 1726 regions
## Mode: Basal plus extension
## Proximal: 5 kb upstream, 1 kb downstream,
## plus Distal: up to 1000 kb
## Include curated regulatory domains
##
## Enrichment tables for following ontologies have been downloaded:
## GO Biological Process
## GO Cellular Component
## GO Molecular Function
More ontologies can be set via the ontology
argument.
getEnrichmentTable(job, ontology = c("GO Biological Process", "Mouse Phenotype"))
tbl
is a list of three tables and each table contains enrichment results for a GO ontology:
names(tbl)
## [1] "GO Molecular Function" "GO Biological Process" "GO Cellular Component"
head(tbl[["GO Biological Process"]])
## ID name Binom_Genome_Fraction
## 1 GO:0009987 cellular process 0.8819650
## 2 GO:0006950 response to stress 0.2518874
## 3 GO:0042981 regulation of apoptotic process 0.1425860
## 4 GO:0010941 regulation of cell death 0.1563783
## 5 GO:0043067 regulation of programmed cell death 0.1441386
## 6 GO:0048523 negative regulation of cellular process 0.4096118
## Binom_Expected Binom_Observed_Region_Hits Binom_Fold_Enrichment
## 1 1522.2720 1647 1.081936
## 2 434.7576 625 1.437583
## 3 246.1035 403 1.637522
## 4 269.9089 430 1.593130
## 5 248.7833 403 1.619884
## 6 706.9899 909 1.285733
## Binom_Region_Set_Coverage Binom_Raw_PValue Binom_Adjp_BH Hyper_Total_Genes
## 1 0.9542294 1.060174e-25 1.393599e-21 14666
## 2 0.3621089 2.393433e-24 1.573084e-20 3265
## 3 0.2334878 5.894798e-24 2.582904e-20 1472
## 4 0.2491309 1.933936e-23 6.355397e-20 1601
## 5 0.2334878 4.276777e-23 1.124365e-19 1486
## 6 0.5266512 7.643382e-23 1.674538e-19 4412
## Hyper_Expected Hyper_Observed_Gene_Hits Hyper_Fold_Enrichment
## 1 1793.2230 1879 1.047834
## 2 399.2140 479 1.199858
## 3 179.9825 283 1.572375
## 4 195.7555 305 1.558066
## 5 181.6943 283 1.557561
## 6 539.4585 723 1.340233
## Hyper_Gene_Set_Coverage Hyper_Term_Gene_Coverage Hyper_Raw_PValue
## 1 0.8284832 0.1281195 8.269245e-07
## 2 0.2111993 0.1467075 2.378972e-06
## 3 0.1247795 0.1922554 6.292510e-16
## 4 0.1344797 0.1905059 1.404290e-16
## 5 0.1247795 0.1904441 2.211597e-15
## 6 0.3187831 0.1638713 3.775104e-21
## Hyper_Adjp_BH
## 1 5.176154e-05
## 2 1.319476e-04
## 3 3.759775e-13
## 4 1.186378e-13
## 5 1.118132e-12
## 6 1.240594e-17
plotVolcano(job, ontology = "GO Biological Process")
The global region-gene association plots:
plotRegionGeneAssociationGraphs(job)
Or the association specifically for a specific term.
plotRegionGeneAssociations(job, ontology = "GO Molecular Function",
term_id = "GO:0004984")
## The webpage for 'GO Molecular Function:GO:0004984' is available at:
## http://great.stanford.edu/public-4.0.4/cgi-bin/showTermDetails.php?termId=GO:0004984&ontoName=GOMolecularFunction&ontoUiName=GO Molecular Function&sessionName=20230803-public-4.0.4-ETOSE4&species=hg19&foreName=filed6f372f354b0.gz&backName=&table=region
## Note the web page might be deleted from GREAT web server because it is only for temporary use.
The table of the associations between genes and regions.
getRegionGeneAssociations(job, ontology = "GO Molecular Function",
term_id = "GO:0004984")
## The webpage for 'GO Molecular Function:GO:0004984' is available at:
## http://great.stanford.edu/public-4.0.4/cgi-bin/showTermDetails.php?termId=GO:0004984&ontoName=GOMolecularFunction&ontoUiName=GO Molecular Function&sessionName=20230803-public-4.0.4-ETOSE4&species=hg19&foreName=filed6f372f354b0.gz&backName=&table=region
## Note the web page might be deleted from GREAT web server because it is only for temporary use.
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | annotated_genes dist_to_TSS
## <Rle> <IRanges> <Rle> | <CharacterList> <IntegerList>
## [1] chr9 125164745-125164985 * | OR1J1 75370
## [2] chr11 5301878-5302118 * | OR51B4 21228
## [3] chr11 56700437-56700677 * | OR9G4,OR5AK2 -189270,-55790
## [4] chr14 22802493-22802733 * | OR4E2 669316
## [5] chr15 22410075-22410315 * | OR4N4 27813
## [6] chr17 2795933-2796173 * | OR1D5 170848
## -------
## seqinfo: 5 sequences from an unspecified genome; no seqlengths
Note, you cannot get the association bewteen genes and all regions. It is not supported by the web GREAT.
There is a Shiny app that contains all the results.
shinyReport(job)
Bonus: If you use the GO category, you can use the simplifyEnrichment to summarize the results.
library(simplifyEnrichment)
= tbl[["GO Biological Process"]]
tb = tb$ID[tb$Binom_Adjp_BH < 0.01]
sig_go length(sig_go)
## [1] 853
simplifyGO(sig_go)
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/UTF-8/C/C/C/C
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] simplifyEnrichment_1.9.4 rGREAT_2.3.1 GenomicRanges_1.50.2
## [4] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2
## [7] BiocGenerics_0.44.0 rmarkdown_2.23
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7
## [2] matrixStats_1.0.0
## [3] bit64_4.0.5
## [4] filelock_1.0.2
## [5] doParallel_1.0.17
## [6] RColorBrewer_1.1-3
## [7] progress_1.2.2
## [8] httr_1.4.6
## [9] tools_4.2.0
## [10] bslib_0.5.0
## [11] utf8_1.2.3
## [12] R6_2.5.1
## [13] DT_0.28
## [14] DBI_1.1.3
## [15] colorspace_2.1-0
## [16] GetoptLong_1.1.0
## [17] tidyselect_1.2.0
## [18] prettyunits_1.1.1
## [19] proxyC_0.3.3
## [20] bit_4.0.5
## [21] curl_5.0.1
## [22] compiler_4.2.0
## [23] cli_3.6.1
## [24] Biobase_2.58.0
## [25] Cairo_1.6-0
## [26] NLP_0.2-1
## [27] xml2_1.3.5
## [28] DelayedArray_0.24.0
## [29] slam_0.1-50
## [30] rtracklayer_1.58.0
## [31] sass_0.4.7
## [32] tm_0.7-11
## [33] rappdirs_0.3.3
## [34] stringr_1.5.0
## [35] digest_0.6.33
## [36] Rsamtools_2.14.0
## [37] XVector_0.38.0
## [38] pkgconfig_2.0.3
## [39] htmltools_0.5.5
## [40] MatrixGenerics_1.10.0
## [41] highr_0.10
## [42] dbplyr_2.3.3
## [43] fastmap_1.1.1
## [44] htmlwidgets_1.6.2
## [45] rlang_1.1.1
## [46] GlobalOptions_0.1.2
## [47] RSQLite_2.3.1
## [48] shiny_1.7.4.1
## [49] prettydoc_0.4.1
## [50] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [51] shape_1.4.6
## [52] jquerylib_0.1.4
## [53] BiocIO_1.8.0
## [54] generics_0.1.3
## [55] jsonlite_1.8.7
## [56] BiocParallel_1.32.6
## [57] GOSemSim_2.24.0
## [58] dplyr_1.1.2
## [59] RCurl_1.98-1.12
## [60] magrittr_2.0.3
## [61] GO.db_3.16.0
## [62] GenomeInfoDbData_1.2.9
## [63] Matrix_1.6-0
## [64] Rcpp_1.0.11
## [65] fansi_1.0.4
## [66] lifecycle_1.0.3
## [67] stringi_1.7.12
## [68] yaml_2.3.7
## [69] SummarizedExperiment_1.28.0
## [70] zlibbioc_1.44.0
## [71] org.Hs.eg.db_3.16.0
## [72] BiocFileCache_2.6.1
## [73] blob_1.2.4
## [74] promises_1.2.0.1
## [75] parallel_4.2.0
## [76] crayon_1.5.2
## [77] lattice_0.21-8
## [78] Biostrings_2.66.0
## [79] GenomicFeatures_1.50.4
## [80] circlize_0.4.16
## [81] hms_1.1.3
## [82] KEGGREST_1.38.0
## [83] magick_2.7.4
## [84] ComplexHeatmap_2.15.4
## [85] knitr_1.43
## [86] pillar_1.9.0
## [87] rjson_0.2.21
## [88] codetools_0.2-19
## [89] biomaRt_2.54.1
## [90] XML_3.99-0.14
## [91] glue_1.6.2
## [92] evaluate_0.21
## [93] RcppParallel_5.1.7
## [94] httpuv_1.6.11
## [95] png_0.1-8
## [96] vctrs_0.6.3
## [97] foreach_1.5.2
## [98] clue_0.3-64
## [99] cachem_1.0.8
## [100] xfun_0.39
## [101] mime_0.12
## [102] xtable_1.8-4
## [103] restfulr_0.0.15
## [104] later_1.3.1
## [105] tibble_3.2.1
## [106] TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0
## [107] iterators_1.0.14
## [108] GenomicAlignments_1.34.1
## [109] AnnotationDbi_1.60.2
## [110] memoise_2.0.1
## [111] cluster_2.1.4
## [112] ellipsis_0.3.2