The .obo format

There are several formats for ontology data. The most compact and readable format is the .obo format, which was initially developed by the GO consortium. A lot of ontologies in .obo format can be found from the OBO Foundry or BioPortal. A description of the .obo format can be found from https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html.

In the simona package, the function import_obo() can be used to import an .obo file to an ontology_DAG object. The input is a path on local computer or an URL. In the following example, we use the Plant Ontology as an example.

The link of po.obo can be found from that web package. You can download it or directly provide it as an URL.

library(simona)
dag1 = import_obo("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.obo")
dag1
## An ontology_DAG object:
##   Source: po, releases/2024-04-17 
##   1658 terms / 1778 relations
##   Root: ~~all~~ 
##   Terms: PO:0000001, PO:0000002, PO:0000003, PO:0000004, ...
##   Max depth: 11 
##   Avg number of parents: 1.07
##   Avg number of children: 1.06
##   Aspect ratio: 39:1 (based on the longest distance from root)
##                 38.2:1 (based on the shortest distance from root)
##   Relations: is_a
## 
## With the following columns in the metadata data frame:
##   id, short_id, name, namespace, definition

There are also several meta columns attached to the object, such as the name and the long definition of terms in the ontology.

head(mcols(dag1))
## DataFrame with 6 rows and 5 columns
##                     id    short_id                   name     namespace
##            <character> <character>            <character>   <character>
## PO:0000001  PO:0000001  PO:0000001    plant embryo proper plant_anatomy
## PO:0000002  PO:0000002  PO:0000002            anther wall plant_anatomy
## PO:0000003  PO:0000003  PO:0000003            whole plant plant_anatomy
## PO:0000004  PO:0000004  PO:0000004 in vitro plant struc.. plant_anatomy
## PO:0000005  PO:0000005  PO:0000005    cultured plant cell plant_anatomy
## PO:0000006  PO:0000006  PO:0000006       plant protoplast plant_anatomy
##                        definition
##                       <character>
## PO:0000001 An embryonic plant s..
## PO:0000002 A microsporangium wa..
## PO:0000003 A plant structure (P..
## PO:0000004 A plant structure (P..
## PO:0000005 A plant cell (PO:000..
## PO:0000006 A cultured plant cel..

Note rows in mcols(dag1) corresponds to terms in dag_all_terms(dag).

The is_a relation between classes is of course saved in the DAG object (specified in the is_a tag in the .obo file). Additional relation types can also be selected (specified in the relationship tag). By default only the relation type part_of is used. You can check other values associated with the relationship tag and the [Typedef] section in the .obo file to select proper additional relation types. Just make sure that the selected relation types are transitive and not inversed (e.g. you cannot select has_part which is a reversed relation of part_of).

Relations can also have a DAG structure. In import_obo(), if a parent relation type is selected, all its offspring types are automatically selected. For example, in GO, besides relations of is_a and part_of, there are also regulates, positively_regulates and negatively_regulates, where the latter two are child relations of regulates. So if regulates is selected as an additional relation type, the other two are automatically selected.

The DAG of relation types is automatically recognized and saved from the ontology files.

import_obo("file_for_go.obo", relation_type = c("part_of", "regulates"))

Finally, all the spaces specified in relation_type will be converted to underlines. So it is the same if you specify "part of" or "part_of".

Other ontology formats

For ontologies in other formats, simona uses an external tool ROBOT to convert them to .obo format and later internally uses import_obo() to import them. ROBOT is already doing a great and professional job of converting between different ontology formats. The file robot.jar is needed and it can be downloaded from https://github.com/ontodev/robot/releases (Since this is a tool in Java, you should have Java already available on your machine).

The file po.owl can also be found from the Plant Ontology web page.

dag2 = import_ontology("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl", 
    robot_jar = "~/Downloads/robot.jar")
dag2
## An ontology_DAG object:
##   Source: po, releases/2024-04-17 
##   1658 terms / 1778 relations
##   Root: ~~all~~ 
##   Terms: PO:0000001, PO:0000002, PO:0000003, PO:0000004, ...
##   Max depth: 11 
##   Avg number of parents: 1.07
##   Avg number of children: 1.06
##   Aspect ratio: 39:1 (based on the longest distance from root)
##                 38.2:1 (based on the shortest distance from root)
##   Relations: is_a
## 
## With the following columns in the metadata data frame:
##   id, short_id, name, namespace, definition

More conveniently, the path of robot.jar can be set as a global option:

simona_opt$robot_jar = "~/Downloads/robot.jar"
import_ontology("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl")

ROBOT supports the following ontology formats and they are automatically identified according to the file contents.

  • json: OBO Graphs JSON
  • obo: OBO Format
  • ofn: OWL Functional
  • omn: Manchester
  • owl: RDF/XML
  • owx: OWL/XML
  • ttl: Turtle

The .owl format

For some huge ontologies, ROBOT requires a huge amount of memory to convert to the .obo format. If the ontology is in the .owl format (in the RDF/XML seriation format), the function import_owl() can be optionally used. import_owl() directly parses the .owl file and returns an ontology_DAG object. The import_owl() is written from scratch and it is recommended to use only when import_ontology() does not work.

dag3 = import_owl("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl")
dag3
## An ontology_DAG object:
##   Source: Plant Ontology, http://purl.obolibrary.org/obo/po/releases/2024-04-17/po.owl 
##   1658 terms / 1778 relations
##   Root: ~~all~~ 
##   Terms: PO:0000001, PO:0000002, PO:0000003, PO:0000004, ...
##   Max depth: 11 
##   Avg number of parents: 1.07
##   Avg number of children: 1.06
##   Aspect ratio: 39:1 (based on the longest distance from root)
##                 38.2:1 (based on the shortest distance from root)
##   Relations: is_a
## 
## With the following columns in the metadata data frame:
##   id, short_id, name, namespace, definition

The .ttl format

Similarly, some ontologies may only provide large .ttl format files (the Turtle format). simona also provides a function import_ttl() which can recognize .ttl file with owl:Class as objects. The internal parsing script is written in Perl, so you need to make sure Perl is installed on your machine.

# https://bioportal.bioontology.org/ontologies/MSTDE
dag4 = import_ttl("https://jokergoo.github.io/simona/MSTDE.ttl")
dag4

Session info

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## time zone: Europe/Berlin
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] simona_1.3.12 knitr_1.48   
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9            xml2_1.3.6            shape_1.4.6.1        
##  [4] digest_0.6.37         magrittr_2.0.3        evaluate_0.24.0      
##  [7] grid_4.4.1            RColorBrewer_1.1-3    iterators_1.0.14     
## [10] circlize_0.4.16       fastmap_1.2.0         foreach_1.5.2        
## [13] doParallel_1.0.17     jsonlite_1.8.8        GlobalOptions_0.1.2  
## [16] promises_1.3.0        ComplexHeatmap_2.20.0 codetools_0.2-20     
## [19] textshaping_0.4.0     jquerylib_0.1.4       cli_3.6.3            
## [22] shiny_1.9.1           rlang_1.1.4           crayon_1.5.3         
## [25] scatterplot3d_0.3-44  cachem_1.1.0          yaml_2.3.10          
## [28] tools_4.4.1           parallel_4.4.1        colorspace_2.1-1     
## [31] httpuv_1.6.15         GetoptLong_1.0.5      BiocGenerics_0.50.0  
## [34] curl_5.2.2            mime_0.12             R6_2.5.1             
## [37] png_0.1-8             matrixStats_1.3.0     stats4_4.4.1         
## [40] lifecycle_1.0.4       S4Vectors_0.42.1      fs_1.6.4             
## [43] htmlwidgets_1.6.4     IRanges_2.38.1        clue_0.3-65          
## [46] ragg_1.3.2            cluster_2.1.6         pkgconfig_2.0.3      
## [49] desc_1.4.3            pkgdown_2.1.0         bslib_0.8.0          
## [52] later_1.3.2           Rcpp_1.0.13           systemfonts_1.1.0    
## [55] xfun_0.47             xtable_1.8-4          rjson_0.2.22         
## [58] htmltools_0.5.8.1     igraph_2.0.3          rmarkdown_2.28       
## [61] Polychrome_1.5.1      compiler_4.4.1