Sim_Zhong_2002
Sim_Zhong_2002(dag, terms, depth_via_LCA = TRUE, verbose = simona_opt$verbose)
For a term x
, it first calculates a "mile-stone" value as
m(x) = 0.5/2^depth(x)
The the distance bewteen term a
and b
via LCA term c
is:
D(c, a) + D(c, b) = m(c) - m(a) + m(c) - m(b)
= 2*m(c) - m(a) - m(b)
= 1/2^depth(c) - 0.5/2^depth(a) - 0.5/2^depth(b)
We change the original depth(a)
to let it go through LCA term c
when calculating the depth:
1/2^depth(c) - 0.5/2^depth(a) - 0.5/2^depth(b)
= 1/2^depth(c)- 0.5/2^(depth(c) + len(c, a)) - 0.5/2^(depth(c) + len(c, b))
= 1/2^depth(c) * (1 - 1/2^(len(c, a) + 1) - 1/2^(len(c, b) + 1))
= 2^-depth(c) * (1 - 2^-(len(c, a) + 1) - 2^-(len(c, b) + 1))
And the final similarity is 1 - distance
:
= 1 - 2^-depth(c) * (1 - 2^-(len(c, a) + 1) - 2^-(len(c, b) + 1)) sim
Paper link: doi:10.1007/3-540-45483-7_8 .
There is a parameter depth_via_LCA
that can be set to TRUE
or FALSE
. IF it is set to TRUE
, depth(a)
is re-defined
as should pass the LCA term c
. If it is FALSE
, it goes to the original similarity definition in the paper and note the
similarity might be negative.
term_sim(dag, terms, method = "Sim_Zhong_2002",
control = list(depth_via_LCA = FALSE))