The analysis task is to evaluate how significant a term includes terms.
dag_enrich_on_offsprings(dag, terms, min_hits = 3, min_offspring = 10)An ontology_DAG object.
A vector of term names.
Minimal number of terms in an offspring set.
Minimal size of the offspring set.
A data frame with the following columns:
term: Term names.
n_hits: Number of terms in terms intersecting to t's offspring terms.
n_offspring: Number of offspring terms of t (including t itself).
n_terms: Number of terms in term intersecting to all terms in the DAG.
n_all: Number of all terms in the DAG.
log2_fold_enrichment: Defined as log2(observation/expected).
z_score: Defined as (observed-expected)/sd.
p_value: P-values from hypergeometric test.
p_adjust: Adjusted p-values from the BH method.
The number of rows in the data frame is the same as the number of terms in the DAG.
Given a list of terms in terms, the function tests whether they are enriched in a term's offspring terms.
The test is based on the hypergeometric distribution. In the following 2x2 contigency table, S is the set of terms,
for a term t in the DAG, T is the set of its offspring plus the t itself, the aim is to test whether S is over-represented
in T.
If there is a significant p-value, we can say the term t preferably includes terms in term.
+----------+------+----------+-----+
| | in S | not in S | all |
+----------+------+----------+-----+
| in T | x11 | x12 | x10 |
| not in T | x21 | x22 | x20 |
+----------+------+----------+-----+
| all | x01 | x02 | x |
+----------+------+----------+-----+# \dontrun{
dag = create_ontology_DAG_from_GO_db()
#> relations: is_a, part_of
terms = random_terms(dag, 100)
df = dag_enrich_on_offsprings(dag, terms)
# }
1
#> [1] 1