Sim_HRSS_2013

Sim_HRSS_2013(dag, terms, verbose = simona_opt$verbose)

Methods

It is similar as the Sim_RSS_2013 method, but it uses information content instead of the distance to adjust the similarity.

It first defines the semantic distance between term a and b as the sum of the distance to their MICA term c:

D(a, b) = D(c, a) + D(c, b)

And the distance between an ancestor to a term is:

D(c, a) = IC(a) - IC(c)  # if c is an ancestor of a
D(a, b) = D(c, a) + D(c, b) = IC(a) + IC(b) - 2*IC(c) # if c is the MICA of a and b

Similarly, the similarity is also corrected by the position of MICA term and a and b in the DAG:

1/(1 + D(a, b)) * alpha/(alph + beta)

Now alpha is the IC of the MICA term:

alpha = IC(c)

And beta is the average of the maximal semantic distance of a and b to leaves.

beta = 0.5*(IC(l_a) - IC(a) + IC(l_b) - IC(b))

where l_a is the leaf that a can reach with the highest IC (i.e. most informative leaf), and so is l_b.