Gene Ontology is the most widely used bio-ontologies. On Bioconductor, there are standard packages for GO (GO.db) and organism-specific GO annotation packages (org.*.db). In simona, there is a helper function create_ontology_DAG_from_GO_db() which makes use of the Biocoductor standard GO packages and constructs a DAG object automatically.

Create the GO DAG object

GO has three namespaces (or ontologies): biological process (BP), molecular function (MF) and celullar component (CC). The three GO namespaces are mutually exclusive, so the first argument of create_ontology_DAG_from_GO_db() is the GO namespace.

## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.17.0 
##   27942 terms / 55956 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 2.00
##   Avg number of children: 1.88
##   Aspect ratio: 363.92:1 (based on the longest distance from root)
##                 782.78:1 (based on the shortest distance from root)
##   Relations: is_a, part_of
## 
## With the following columns in the metadata data frame:
##   id, name, definition

There are three main GO relations: “is_a”, “part_of” and “regulates”. “regulates” has two child relation types in GO: “negatively_regulates” and “positively_regulates”. So when “regulates” is selected, the two child relation types are automatically selected. By default only “is_a” and “part_of” are selected.

You can set a subset of relation types with the argument relations.

create_ontology_DAG_from_GO_db("BP", relations = c("part of", "regulates"))  # "part_of" is also OK
## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.17.0 
##   27942 terms / 64560 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 2.31
##   Avg number of children: 2.20
##   Aspect ratio: 285:1 (based on the longest distance from root)
##                 1015:1 (based on the shortest distance from root)
##   Relations: is_a, negatively_regulates, part_of, positively_regulates,
##              regulates
##   Relation types may have hierarchical relations.
## 
## With the following columns in the metadata data frame:
##   id, name, definition

“is_a” is always selected because this is primary semantic relation type. So if you only want to include “is_a” relation, you can assign an empty vector to relations:

create_ontology_DAG_from_GO_db("BP", relations = character(0)) # or NULL, NA

Or you can apply dag_filter() after DAG is generated.

dag = create_ontology_DAG_from_GO_db("BP")
dag_filter(dag, relations = "is_a")

Add gene annotation

Gene annotation can be set with the argument org_db. The value is an OrgDb object of the corresponding organism. The primary gene ID type in the __org.*.db__ package is internally used (which is normally the EntreZ ID type).

library(org.Hs.eg.db)
dag = create_ontology_DAG_from_GO_db("BP", org_db = org.Hs.eg.db)
dag
## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.17.0 
##   27942 terms / 55956 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 2.00
##   Avg number of children: 1.88
##   Aspect ratio: 363.92:1 (based on the longest distance from root)
##                 782.78:1 (based on the shortest distance from root)
##   Relations: is_a, part_of
##   Annotations: 18614 items
##                291, 1890, 4205, 4358, ...
## 
## With the following columns in the metadata data frame:
##   id, name, definition

For standard organism packages on Biocoductor, the OrgDb object always has the same name as the package, so the name of the organism package can also be set to org_db:

create_ontology_DAG_from_GO_db("BP", org_db = "org.Hs.eg.db")

Similarly, if the analysis is applied on mouse, the mouse organism package can be set to org_db. If the mouse organism package is not installed yet, it will be installed automatically.

create_ontology_DAG_from_GO_db("BP", org_db = "org.Mm.eg.db")

Genes that are annotated to GO terms can be obtained by term_annotations(). Note the genes are automatically merged from offspring terms.

term_annotations(dag, c("GO:0000002", "GO:0000012"))
## $`GO:0000002`
##  [1] "291"    "1890"   "4205"   "4358"   "4976"   "9361"   "10000"  "55186" 
##  [9] "80119"  "84275"  "92667"  "1763"   "142"    "7157"   "9093"   "7156"  
## [17] "6240"   "50484"  "2021"   "11232"  "83667"  "5428"   "6742"   "56652" 
## [25] "201973"
## 
## $`GO:0000012`
##  [1] "1161"      "2074"      "3981"      "7141"      "7515"      "23411"    
##  [7] "54840"     "55775"     "200558"    "100133315"

Meta data frame

There are additional meta columns attached to the DAG object. They can be accessed by mcols().

head(mcols(dag))
## DataFrame with 6 rows and 3 columns
##                     id                   name             definition
##            <character>            <character>            <character>
## GO:0000001  GO:0000001 mitochondrion inheri.. The distribution of ..
## GO:0000002  GO:0000002 mitochondrial genome.. The maintenance of t..
## GO:0000003  GO:0000003           reproduction The production of ne..
## GO:0000011  GO:0000011    vacuole inheritance The distribution of ..
## GO:0000012  GO:0000012 single strand break .. The repair of single..
## GO:0000017  GO:0000017 alpha-glucoside tran.. The directed movemen..

The additional information of GO terms is from the GO.db package. The row order of the meta data frame is the same as in dag_all_terms(dag).

Session info

## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## time zone: Europe/Berlin
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] org.Hs.eg.db_3.17.0  AnnotationDbi_1.62.2 IRanges_2.36.0      
## [4] S4Vectors_0.40.2     Biobase_2.60.0       BiocGenerics_0.48.1 
## [7] simona_1.3.12        knitr_1.45          
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.40.1         circlize_0.4.16         shape_1.4.6.1          
##  [4] rjson_0.2.21            xfun_0.43               bslib_0.7.0            
##  [7] htmlwidgets_1.6.4       GlobalOptions_0.1.2     bitops_1.0-7           
## [10] vctrs_0.6.5             tools_4.3.3             parallel_4.3.3         
## [13] Polychrome_1.5.1        RSQLite_2.3.6           cluster_2.1.6          
## [16] blob_1.2.4              pkgconfig_2.0.3         RColorBrewer_1.1-3     
## [19] desc_1.4.3              scatterplot3d_0.3-44    GenomeInfoDbData_1.2.10
## [22] lifecycle_1.0.4         compiler_4.3.3          Biostrings_2.68.1      
## [25] textshaping_0.3.7       codetools_0.2-19        ComplexHeatmap_2.18.0  
## [28] clue_0.3-65             GenomeInfoDb_1.36.4     httpuv_1.6.15          
## [31] htmltools_0.5.8.1       sass_0.4.9              RCurl_1.98-1.14        
## [34] yaml_2.3.8              pkgdown_2.0.9           later_1.3.2            
## [37] crayon_1.5.2            jquerylib_0.1.4         GO.db_3.17.0           
## [40] cachem_1.0.8            iterators_1.0.14        foreach_1.5.2          
## [43] mime_0.12               digest_0.6.35           purrr_1.0.2            
## [46] fastmap_1.1.1           grid_4.3.3              colorspace_2.1-0       
## [49] cli_3.6.2               magrittr_2.0.3          promises_1.3.0         
## [52] bit64_4.0.5             XVector_0.40.0          rmarkdown_2.26         
## [55] httr_1.4.7              matrixStats_1.3.0       igraph_2.0.3           
## [58] bit_4.0.5               ragg_1.3.1              png_0.1-8              
## [61] GetoptLong_1.0.5        memoise_2.0.1           shiny_1.8.1.1          
## [64] evaluate_0.23           doParallel_1.0.17       rlang_1.1.3            
## [67] Rcpp_1.0.12             xtable_1.8-4            DBI_1.2.2              
## [70] xml2_1.3.6              jsonlite_1.8.8          R6_2.5.1               
## [73] zlibbioc_1.46.0         systemfonts_1.0.6       fs_1.6.4