Relabel class labels according to the reference labels

relabel_class(class, ref, full_set = union(class, ref), return_map = TRUE)

Arguments

class

A vector of class labels.

ref

A vector of reference labels.

full_set

The full set of labels.

return_map

Whether to return the mapping of the adjusted labels.

Details

In partitions, the exact value of the class label is not of importance. E.g. for two partitions a, a, a, b, b, b, b and b, b, b, a, a, a, a, they are the same partitions although the labels of a and b are switched in the two partitions. Even the partition c, c, c, d, d, d, d is the same as the previous two although it uses a different set of labels. Here relabel_class function relabels class vector according to the labels in ref vector by looking for a mapping m() to maximize sum(m(class) == ref).

Mathematically, this is called linear sum assignment problem and it is solved by solve_LSAP.

Value

A named vector where names correspond to the labels in class and values correspond to ref, which means map = relabel_class(class, ref); map[class] returns the relabelled labels.

The returned object attaches a data frame with three columns:

  • original labels. in class

  • adjusted labels. according to ref

  • reference labels. in ref

If return_map in the relabel_class is set to FALSE, the function simply returns a vector of adjusted class labels.

If the function returns the mapping vector (when return_map = TRUE), the mapping variable is always character, which means, if your class and ref are numeric, you need to convert them back to numeric explicitely. If return_map = FALSE, the returned relabelled vector has the same mode as class.

Examples

class = c(rep("a", 10), rep("b", 3))
ref = c(rep("b", 4), rep("a", 9))
relabel_class(class, ref)
#>   a   b 
#> "b" "a" 
#> attr(,"df")
#>    class adjusted ref
#> 1      a        b   b
#> 2      a        b   b
#> 3      a        b   b
#> 4      a        b   b
#> 5      a        b   a
#> 6      a        b   a
#> 7      a        b   a
#> 8      a        b   a
#> 9      a        b   a
#> 10     a        b   a
#> 11     b        a   a
#> 12     b        a   a
#> 13     b        a   a
relabel_class(class, ref, return_map = FALSE)
#>  [1] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "a" "a" "a"
# if class and ref are from completely different sets
class = c(rep("A", 10), rep("B", 3))
relabel_class(class, ref)
#>   A   B   b   a 
#> "b" "a" "b" "a" 
#> attr(,"df")
#>    class adjusted ref
#> 1      A        b   b
#> 2      A        b   b
#> 3      A        b   b
#> 4      A        b   b
#> 5      A        b   a
#> 6      A        b   a
#> 7      A        b   a
#> 8      A        b   a
#> 9      A        b   a
#> 10     A        b   a
#> 11     B        a   a
#> 12     B        a   a
#> 13     B        a   a

# class labels are numeric
class = c(rep(1, 10), rep(2, 3))
ref = c(rep(2, 4), rep(1, 9))
relabel_class(class, ref)
#>   1   2 
#> "2" "1" 
#> attr(,"df")
#>    class adjusted ref
#> 1      1        2   2
#> 2      1        2   2
#> 3      1        2   2
#> 4      1        2   2
#> 5      1        2   1
#> 6      1        2   1
#> 7      1        2   1
#> 8      1        2   1
#> 9      1        2   1
#> 10     1        2   1
#> 11     2        1   1
#> 12     2        1   1
#> 13     2        1   1
relabel_class(class, ref, return_map = FALSE)
#>  [1] 2 2 2 2 2 2 2 2 2 2 1 1 1