The analysis task is to evaluate how significant a term includes terms.

dag_enrich_on_offsprings(dag, terms, min_hits = 3, min_offspring = 10)

Arguments

dag

An ontology_DAG object.

terms

A vector of term names.

min_hits

Minimal number of terms in an offspring set.

min_offspring

Minimal size of the offspring set.

Value

A data frame with the following columns:

  • term: Term names.

  • n_hits: Number of terms in terms intersecting to t's offspring terms.

  • n_offspring: Number of offspring terms of t (including t itself).

  • n_terms: Number of terms in term intersecting to all terms in the DAG.

  • n_all: Number of all terms in the DAG.

  • log2_fold_enrichment: Defined as log2(observation/expected).

  • z_score: Defined as (observed-expected)/sd.

  • p_value: P-values from hypergeometric test.

  • p_adjust: Adjusted p-values from the BH method.

The number of rows in the data frame is the same as the number of terms in the DAG.

Details

Given a list of terms in terms, the function tests whether they are enriched in a term's offspring terms. The test is based on the hypergeometric distribution. In the following 2x2 contigency table, S is the set of terms, for a term t in the DAG, T is the set of its offspring plus the t itself, the aim is to test whether S is over-represented in T.

If there is a significant p-value, we can say the term t preferably includes terms in term.

+----------+------+----------+-----+
|          | in S | not in S | all |
+----------+------+----------+-----+
| in T     |  x11 |    x12   | x10 |
| not in T |  x21 |    x22   | x20 |
+----------+------+----------+-----+
| all      |  x01 |    x02   |  x  |
+----------+------+----------+-----+

Examples

# \dontrun{
dag = create_ontology_DAG_from_GO_db() 
#> relations: is_a, part_of
terms = random_terms(dag, 100)
df = dag_enrich_on_offsprings(dag, terms)
# }
1
#> [1] 1