simona provides functions for generating random DAGs. A random tree is first generated, later more links can be randomly added to form a more general DAG.

Random trees

dag_random_tree() generates a random tree. By default it generates a binary tree where all leaf terms have depth = 9.

library(simona)
set.seed(123)
tree1 = dag_random_tree()
tree1
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1023 terms / 1022 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Aspect ratio: 56.89:1

Strictly speaking, tree1 is not random. The tree is growing from the root. In dag_random_tree(), there are several arguments that can be used for generating random trees.

  • n_children: Number of child terms. It can be a single value where each term will the same number of child terms. The value can also be a range, then the number of child terms will be randomly picked in that range.
  • p_stop: A branch can stop growing based on this probability. On a certain step of the tree growing, let’s denote the set of leaf terms as L, then, in the next round, floor(length(L)*p_stop) leaf terms will stop growing, while the remaining leaf terms will continue to grow. If a leaf term continues to grow, it will be linked to n_children child terms if n_children is a single value, or pick a number from the range of [n_children[1], n_children[2]].

The tree growing stops when the number of total terms exceeds max.

So the default call of dag_random_tree() is identical to:

dag_random_tree(n_children = 2, p_stop = 0, max = 2^10 - 1)

We can change these arguments to some other values, such as:

tree2 = dag_random_tree(n_children = c(2, 6), p_stop = 0.5, max = 2000)
tree2
## An ontology_DAG object:
##   Source: dag_random_tree 
##   1999 terms / 1998 relations / a tree
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 7 
##   Aspect ratio: 105.71:1

Random DAGs

A more general random DAG is generated based on the random tree. Taking tree1 which is already generated, the function dag_add_random_children() adds more random children to terms in tree1.

dag1 = dag_add_random_children(tree1)
dag1
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1115 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.09
##   Avg number of children: 1.03
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 52.78:1 (based on the shortest distance from root)

There are three arguments that controls new child terms. We first introduce two of them.

  • p_add: For each term, the probability that it is selected to add new child terms.
  • new_children: Once a term is selected, the number of new children it is linked to.

Let’s try to generate a more dense DAG:

dag2 = dag_add_random_children(tree1, p_add = 0.6, new_children = c(2, 8))
dag2
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 2550 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 2.50
##   Avg number of children: 1.59
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)

By default, once a term t is going to add more child terms, it only selects new child terms from the terms that are:

  1. lower than t, i.e. with depths less than t’s depth in the DAG.
  2. not the child terms that t already has.

Then in this subset of candidate child terms, new child terms is randomly picked according to the numbers set in new_children.

The way to randomly pick new child terms can be implemented as a self-defined function. This function accepts two arguments, the dag object and an integer index of “current term”. In the following example, we implemented a function which only pick new child terms from term t’s offspring terms.

add_new_children_from_offspring = function(dag, i, new_children = c(1, 8)) {

    l = rep(FALSE, dag_n_terms(dag))
    offspring = dag_offspring(dag, i, in_labels = FALSE)
    if(length(offspring)) {
        l[offspring] = TRUE

        l[dag_children(dag, i, in_labels = FALSE)] = FALSE
    }

    candidates = which(l)
    n_candidates = length(candidates)
    if(n_candidates) {
        if(n_candidates < new_children[1]) {
            integer(0)
        } else {
            sample(candidates, min(n_candidates, sample(seq(new_children[1], new_children[2]), 1)))
        }
    } else {
        integer(0)
    }  
}

dag3 = dag_add_random_children(tree1, p_add = 0.6,
    add_random_children_fun = add_new_children_from_offspring)
dag3
## An ontology_DAG object:
##   Source: dag_add_random_children 
##   1023 terms / 1583 relations
##   Root: 1 
##   Terms: 1, 10, 100, 1000, ...
##   Max depth: 9 
##   Avg number of parents: 1.55
##   Avg number of children: 1.25
##   Aspect ratio: 56.89:1 (based on the longest distance from root)
##                 32.22:1 (based on the shortest distance from root)

Session info

## R version 4.3.3 (2024-02-29)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## time zone: Europe/Berlin
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] simona_1.3.12
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9            xml2_1.3.6            shape_1.4.6.1        
##  [4] digest_0.6.35         magrittr_2.0.3        evaluate_0.23        
##  [7] grid_4.3.3            RColorBrewer_1.1-3    iterators_1.0.14     
## [10] circlize_0.4.16       fastmap_1.1.1         foreach_1.5.2        
## [13] doParallel_1.0.17     jsonlite_1.8.8        GlobalOptions_0.1.2  
## [16] promises_1.3.0        ComplexHeatmap_2.18.0 purrr_1.0.2          
## [19] codetools_0.2-19      textshaping_0.3.7     jquerylib_0.1.4      
## [22] cli_3.6.2             shiny_1.8.1.1         rlang_1.1.3          
## [25] crayon_1.5.2          scatterplot3d_0.3-44  cachem_1.0.8         
## [28] yaml_2.3.8            tools_4.3.3           parallel_4.3.3       
## [31] memoise_2.0.1         colorspace_2.1-0      httpuv_1.6.15        
## [34] GetoptLong_1.0.5      BiocGenerics_0.48.1   mime_0.12            
## [37] vctrs_0.6.5           R6_2.5.1              png_0.1-8            
## [40] matrixStats_1.3.0     stats4_4.3.3          lifecycle_1.0.4      
## [43] S4Vectors_0.40.2      fs_1.6.4              htmlwidgets_1.6.4    
## [46] IRanges_2.36.0        clue_0.3-65           ragg_1.3.1           
## [49] cluster_2.1.6         pkgconfig_2.0.3       desc_1.4.3           
## [52] later_1.3.2           pkgdown_2.0.9         bslib_0.7.0          
## [55] Rcpp_1.0.12           systemfonts_1.0.6     highr_0.10           
## [58] xfun_0.43             knitr_1.45            xtable_1.8-4         
## [61] rjson_0.2.21          htmltools_0.5.8.1     igraph_2.0.3         
## [64] rmarkdown_2.26        Polychrome_1.5.1      compiler_4.3.3