IC_Wang_2007

IC_Wang_2007(
  dag,
  contribution_factor = c(is_a = 0.8, part_of = 0.6),
  use_cache = simona_opt$use_cache,
  verbose = simona_opt$verbose
)

Methods

IC_Wang_2007

Each relation is weighted by a value less than 1 based on the semantic relation, i.e. 0.8 for "is_a" and 0.6 for "part_of". For a term t and one of its ancestor term a, it first calculates an "S-value" which corresponds to a path from a to t where the accumulated multiplication of weights along the path reaches maximal:

S(a->t) = max_{path}(prod_{node on the paty}(w))

Here max goes over all possible paths from a to t, and prod() multiplies edge weights in a certain path.

The formula can be transformed as (we simply rewrite S(a->t) to S):

1/S = min(prod(1/w))
log(1/S) = log(min(prod(1/w)))
         = min(sum(log(1/w)))

Since w < 1, log(1/w) is positive. According to the equation, the path (a->...->t) is actually the shortest path from a to t by taking log(1/w) as the weight, and log(1/S) is the weighted shortest distance.

If S(a->t) can be thought as the maximal semantic contribution from a to t, the information content is calculated as the sum from all t's ancestors (including t itself):

IC = sum_{a in t's ancestors + t}(S(a->t))

Paper link: doi:10.1093/bioinformatics/btm087 .

The contribution of different semantic relations can be set with the contribution_factor parameter. The value should be a named numeric vector where names should cover the relations defined in relations set in create_ontology_DAG(). For example, if there are two relations "relation_a" and "relation_b" set in the DAG, the value for contribution_factor can be set as:

term_IC(dag, method = "IC_Wang", 
    control = list(contribution_factor = c("relation_a" = 0.8, "relation_b" = 0.6)))

Note the IC_Wang_2007 method is normally used within the Sim_Wang_2007 semantic similarity method.