The analysis task is to evaluate how significant a term includes terms
.
dag_enrich_on_offsprings(dag, terms, min_hits = 3, min_offspring = 10)
An ontology_DAG
object.
A vector of term names.
Minimal number of terms in an offspring set.
Minimal size of the offspring set.
A data frame with the following columns:
term
: Term names.
n_hits
: Number of terms in terms
intersecting to t
's offspring terms.
n_offspring
: Number of offspring terms of t
(including t
itself).
n_terms
: Number of terms in term
intersecting to all terms in the DAG.
n_all
: Number of all terms in the DAG.
log2_fold_enrichment
: Defined as log2(observation/expected).
z_score
: Defined as (observed-expected)/sd.
p_value
: P-values from hypergeometric test.
p_adjust
: Adjusted p-values from the BH method.
The number of rows in the data frame is the same as the number of terms in the DAG.
Given a list of terms in terms
, the function tests whether they are enriched in a term's offspring terms.
The test is based on the hypergeometric distribution. In the following 2x2 contigency table, S
is the set of terms
,
for a term t
in the DAG, T
is the set of its offspring plus the t
itself, the aim is to test whether S
is over-represented
in T
.
If there is a significant p-value, we can say the term t
preferably includes terms in term
.
+----------+------+----------+-----+
| | in S | not in S | all |
+----------+------+----------+-----+
| in T | x11 | x12 | x10 |
| not in T | x21 | x22 | x20 |
+----------+------+----------+-----+
| all | x01 | x02 | x |
+----------+------+----------+-----+
# \dontrun{
dag = create_ontology_DAG_from_GO_db()
#> relations: is_a, part_of
terms = random_terms(dag, 100)
df = dag_enrich_on_offsprings(dag, terms)
# }
1
#> [1] 1