Gene Ontology is the most widely used bio-ontologies. On Bioconductor, there are standard packages for GO (GO.db) and organism-specific GO annotation packages (org.*.db). In simona, there is a helper function create_ontology_DAG_from_GO_db()
which makes use of the Biocoductor standard GO packages and constructs a DAG object automatically.
GO has three namespaces (or ontologies): biological process (BP), molecular function (MF) and celullar component (CC). The three GO namespaces are mutually exclusive, so the first argument of create_ontology_DAG_from_GO_db()
is the GO namespace.
library(simona)
dag = create_ontology_DAG_from_GO_db("BP")
dag
## An ontology_DAG object:
## Source: GO BP / GO.db package 3.19.1
## 27186 terms / 54178 relations
## Root: GO:0008150
## Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
## Max depth: 18
## Avg number of parents: 1.99
## Avg number of children: 1.87
## Aspect ratio: 356.46:1 (based on the longest distance from root)
## 756.89:1 (based on the shortest distance from root)
## Relations: is_a, part_of
##
## With the following columns in the metadata data frame:
## id, name, definition
There are three main GO relations: “is_a”, “part_of” and “regulates”. “regulates” has two child relation types in GO: “negatively_regulates” and “positively_regulates”. So when “regulates” is selected, the two child relation types are automatically selected. By default only “is_a” and “part_of” are selected.
You can set a subset of relation types with the argument relations
.
create_ontology_DAG_from_GO_db("BP", relations = c("part of", "regulates")) # "part_of" is also OK
## An ontology_DAG object:
## Source: GO BP / GO.db package 3.19.1
## 27186 terms / 62554 relations
## Root: GO:0008150
## Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
## Max depth: 18
## Avg number of parents: 2.30
## Avg number of children: 2.18
## Aspect ratio: 274.6:1 (based on the longest distance from root)
## 982.38:1 (based on the shortest distance from root)
## Relations: is_a, negatively_regulates, part_of, positively_regulates,
## regulates
## Relation types may have hierarchical relations.
##
## With the following columns in the metadata data frame:
## id, name, definition
“is_a” is always selected because this is primary semantic relation type. So if you only want to include “is_a” relation, you can assign an empty vector to relations
:
create_ontology_DAG_from_GO_db("BP", relations = character(0)) # or NULL, NA
Or you can apply dag_filter()
after DAG is generated.
dag = create_ontology_DAG_from_GO_db("BP") dag_filter(dag, relations = "is_a")
Gene annotation can be set with the argument org_db
. The value is an OrgDb
object of the corresponding organism. The primary gene ID type in the __org.*.db__ package is internally used (which is normally the EntreZ ID type).
library(org.Hs.eg.db)
dag = create_ontology_DAG_from_GO_db("BP", org_db = org.Hs.eg.db)
dag
## An ontology_DAG object:
## Source: GO BP / GO.db package 3.19.1
## 27186 terms / 54178 relations
## Root: GO:0008150
## Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
## Max depth: 18
## Avg number of parents: 1.99
## Avg number of children: 1.87
## Aspect ratio: 356.46:1 (based on the longest distance from root)
## 756.89:1 (based on the shortest distance from root)
## Relations: is_a, part_of
## Annotations: 18888 items
## 291, 1890, 4205, 4358, ...
##
## With the following columns in the metadata data frame:
## id, name, definition
For standard organism packages on Biocoductor, the OrgDb
object always has the same name as the package, so the name of the organism package can also be set to org_db
:
create_ontology_DAG_from_GO_db("BP", org_db = "org.Hs.eg.db")
Similarly, if the analysis is applied on mouse, the mouse organism package can be set to org_db
. If the mouse organism package is not installed yet, it will be installed automatically.
create_ontology_DAG_from_GO_db("BP", org_db = "org.Mm.eg.db")
Genes that are annotated to GO terms can be obtained by term_annotations()
. Note the genes are automatically merged from offspring terms.
term_annotations(dag, c("GO:0000002", "GO:0000012"))
## $`GO:0000002`
## [1] "291" "1890" "4205" "4358" "4976" "9361" "10000" "55186"
## [9] "80119" "84275" "92667" "1763" "142" "7157" "9093" "7156"
## [17] "6240" "50484" "2021" "11232" "83667" "5428" "6742" "56652"
## [25] "201973" "131474"
##
## $`GO:0000012`
## [1] "1161" "2074" "3981" "7141" "7515" "23411"
## [7] "54840" "55775" "200558" "100133315"
There are additional meta columns attached to the DAG object. They can be accessed by mcols()
.
head(mcols(dag))
## DataFrame with 6 rows and 3 columns
## id name definition
## <character> <character> <character>
## GO:0000001 GO:0000001 mitochondrion inheri.. The distribution of ..
## GO:0000002 GO:0000002 mitochondrial genome.. The maintenance of t..
## GO:0000003 GO:0000003 reproduction The production of ne..
## GO:0000011 GO:0000011 vacuole inheritance The distribution of ..
## GO:0000012 GO:0000012 single strand break .. The repair of single..
## GO:0000017 GO:0000017 alpha-glucoside tran.. The directed movemen..
The additional information of GO terms is from the GO.db package. The row order of the meta data frame is the same as in dag_all_terms(dag)
.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] org.Hs.eg.db_3.19.1 AnnotationDbi_1.66.0 IRanges_2.38.1
## [4] S4Vectors_0.42.1 Biobase_2.64.0 BiocGenerics_0.50.0
## [7] simona_1.3.12 knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.44.1 circlize_0.4.16 shape_1.4.6.1
## [4] rjson_0.2.22 xfun_0.47 bslib_0.8.0
## [7] htmlwidgets_1.6.4 GlobalOptions_0.1.2 vctrs_0.6.5
## [10] tools_4.4.1 parallel_4.4.1 Polychrome_1.5.1
## [13] RSQLite_2.3.7 cluster_2.1.6 blob_1.2.4
## [16] pkgconfig_2.0.3 RColorBrewer_1.1-3 desc_1.4.3
## [19] scatterplot3d_0.3-44 GenomeInfoDbData_1.2.12 lifecycle_1.0.4
## [22] compiler_4.4.1 textshaping_0.4.0 Biostrings_2.72.1
## [25] codetools_0.2-20 ComplexHeatmap_2.20.0 clue_0.3-65
## [28] GenomeInfoDb_1.40.1 httpuv_1.6.15 htmltools_0.5.8.1
## [31] sass_0.4.9 yaml_2.3.10 later_1.3.2
## [34] pkgdown_2.1.0 crayon_1.5.3 jquerylib_0.1.4
## [37] GO.db_3.19.1 cachem_1.1.0 iterators_1.0.14
## [40] foreach_1.5.2 mime_0.12 digest_0.6.37
## [43] fastmap_1.2.0 grid_4.4.1 colorspace_2.1-1
## [46] cli_3.6.3 magrittr_2.0.3 UCSC.utils_1.0.0
## [49] promises_1.3.0 bit64_4.0.5 XVector_0.44.0
## [52] rmarkdown_2.28 httr_1.4.7 matrixStats_1.3.0
## [55] igraph_2.0.3 bit_4.0.5 ragg_1.3.2
## [58] png_0.1-8 GetoptLong_1.0.5 memoise_2.0.1
## [61] shiny_1.9.1 evaluate_0.24.0 doParallel_1.0.17
## [64] rlang_1.1.4 Rcpp_1.0.13 xtable_1.8-4
## [67] DBI_1.2.3 xml2_1.3.6 jsonlite_1.8.8
## [70] R6_2.5.1 zlibbioc_1.50.0 systemfonts_1.1.0
## [73] fs_1.6.4