Sim_Zhong_2002

Sim_Zhong_2002(dag, terms, depth_via_LCA = TRUE, verbose = simona_opt$verbose)

Methods

Sim_Zhong_2002

For a term x, it first calculates a "mile-stone" value as

m(x) = 0.5/2^depth(x)

The the distance bewteen term a and b via LCA term c is:

D(c, a) + D(c, b) = m(c) - m(a) + m(c) - m(b)
                  = 2*m(c) - m(a) - m(b)
                  = 1/2^depth(c) - 0.5/2^depth(a) - 0.5/2^depth(b)

We change the original depth(a) to let it go through LCA term c when calculating the depth:

1/2^depth(c) - 0.5/2^depth(a) - 0.5/2^depth(b) 
    = 1/2^depth(c)- 0.5/2^(depth(c) + len(c, a)) - 0.5/2^(depth(c) + len(c, b))
    = 1/2^depth(c) * (1 - 1/2^(len(c, a) + 1) - 1/2^(len(c, b) + 1))
    = 2^-depth(c) * (1 - 2^-(len(c, a) + 1) - 2^-(len(c, b) + 1))

And the final similarity is 1 - distance:

sim = 1 - 2^-depth(c) * (1 - 2^-(len(c, a) + 1) - 2^-(len(c, b) + 1))

Paper link: doi:10.1007/3-540-45483-7_8 .

There is a parameter depth_via_LCA that can be set to TRUE or FALSE. IF it is set to TRUE, depth(a) is re-defined as should pass the LCA term c. If it is FALSE, it goes to the original similarity definition in the paper and note the similarity might be negative.

term_sim(dag, terms, method = "Sim_Zhong_2002",
    control = list(depth_via_LCA = FALSE))