IC_Wang_2007
IC_Wang_2007(
dag,
contribution_factor = c(is_a = 0.8, part_of = 0.6),
use_cache = simona_opt$use_cache,
verbose = simona_opt$verbose
)
Each relation is weighted by a value less than 1 based on the semantic relation, i.e. 0.8 for "is_a" and 0.6 for "part_of".
For a term t
and one of its ancestor term a
, it first calculates an "S-value" which corresponds to a path from a
to t
where
the accumulated multiplication of weights along the path reaches maximal:
S(a->t) = max_{path}(prod_{node on the paty}(w))
Here max
goes over all possible paths from a
to t
, and prod()
multiplies edge weights in a certain path.
The formula can be transformed as (we simply rewrite S(a->t)
to S
):
1/S = min(prod(1/w))
log(1/S) = log(min(prod(1/w)))
= min(sum(log(1/w)))
Since w < 1
, log(1/w)
is positive. According to the equation, the path (a->...->t
) is actually the shortest path from a
to t
by taking
log(1/w)
as the weight, and log(1/S)
is the weighted shortest distance.
If S(a->t)
can be thought as the maximal semantic contribution from a
to t
, the information content is calculated
as the sum from all t
's ancestors (including t
itself):
= sum_{a in t's ancestors + t}(S(a->t)) IC
Paper link: doi:10.1093/bioinformatics/btm087 .
The contribution of different semantic relations can be set with the contribution_factor
parameter. The value should be a named numeric
vector where names should cover the relations defined in relations
set in create_ontology_DAG()
. For example, if there are two relations
"relation_a" and "relation_b" set in the DAG, the value for contribution_factor
can be set as:
term_IC(dag, method = "IC_Wang",
control = list(contribution_factor = c("relation_a" = 0.8, "relation_b" = 0.6)))
Note the IC_Wang_2007 method is normally used within the Sim_Wang_2007 semantic similarity method.