Sim_HRSS_2013
Sim_HRSS_2013(dag, terms, verbose = simona_opt$verbose)
It is similar as the Sim_RSS_2013 method, but it uses information content instead of the distance to adjust the similarity.
It first defines the semantic distance between term a
and b
as the sum of the distance to their MICA term c
:
D(a, b) = D(c, a) + D(c, b)
And the distance between an ancestor to a term is:
D(c, a) = IC(a) - IC(c) # if c is an ancestor of a
D(a, b) = D(c, a) + D(c, b) = IC(a) + IC(b) - 2*IC(c) # if c is the MICA of a and b
Similarly, the similarity is also corrected by the position of MICA term and a
and b
in the DAG:
1/(1 + D(a, b)) * alpha/(alph + beta)
Now alpha
is the IC of the MICA term:
= IC(c) alpha
And beta
is the average of the maximal semantic distance of a
and b
to leaves.
= 0.5*(IC(l_a) - IC(a) + IC(l_b) - IC(b)) beta
where l_a
is the leaf that a
can reach with the highest IC (i.e. most informative leaf), and so is l_b
.
Paper link: doi:10.1371/journal.pone.0066745 .