relabel_class.Rd
Relabel class labels according to the reference labels
relabel_class(class, ref, full_set = union(class, ref), return_map = TRUE)
A vector of class labels.
A vector of reference labels.
The full set of labels.
Whether to return the mapping of the adjusted labels.
In partitions, the exact value of the class label is not of importance. E.g. for two partitions
a, a, a, b, b, b, b
and b, b, b, a, a, a, a
, they are the same partitions although the labels
of a
and b
are switched in the two partitions. Even the partition c, c, c, d, d, d, d
is the same as the previous two although it uses a different set of labels. Here relabel_class
function relabels
class
vector according to the labels in ref
vector by looking for a mapping m()
to maximize sum(m(class) == ref)
.
Mathematically, this is called linear sum assignment problem and it is solved by solve_LSAP
.
A named vector where names correspond to the labels in class
and values correspond to ref
,
which means map = relabel_class(class, ref); map[class]
returns the relabelled labels.
The returned object attaches a data frame with three columns:
original labels. in class
adjusted labels. according to ref
reference labels. in ref
If return_map
in the relabel_class
is set to FALSE
, the function simply returns
a vector of adjusted class labels.
If the function returns the mapping vector (when return_map = TRUE
), the mapping variable
is always character, which means, if your class
and ref
are numeric, you need to convert
them back to numeric explicitely. If return_map = FALSE
, the returned relabelled vector has
the same mode as class
.
class = c(rep("a", 10), rep("b", 3))
ref = c(rep("b", 4), rep("a", 9))
relabel_class(class, ref)
#> a b
#> "b" "a"
#> attr(,"df")
#> class adjusted ref
#> 1 a b b
#> 2 a b b
#> 3 a b b
#> 4 a b b
#> 5 a b a
#> 6 a b a
#> 7 a b a
#> 8 a b a
#> 9 a b a
#> 10 a b a
#> 11 b a a
#> 12 b a a
#> 13 b a a
relabel_class(class, ref, return_map = FALSE)
#> [1] "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "a" "a" "a"
# if class and ref are from completely different sets
class = c(rep("A", 10), rep("B", 3))
relabel_class(class, ref)
#> A B b a
#> "b" "a" "b" "a"
#> attr(,"df")
#> class adjusted ref
#> 1 A b b
#> 2 A b b
#> 3 A b b
#> 4 A b b
#> 5 A b a
#> 6 A b a
#> 7 A b a
#> 8 A b a
#> 9 A b a
#> 10 A b a
#> 11 B a a
#> 12 B a a
#> 13 B a a
# class labels are numeric
class = c(rep(1, 10), rep(2, 3))
ref = c(rep(2, 4), rep(1, 9))
relabel_class(class, ref)
#> 1 2
#> "2" "1"
#> attr(,"df")
#> class adjusted ref
#> 1 1 2 2
#> 2 1 2 2
#> 3 1 2 2
#> 4 1 2 2
#> 5 1 2 1
#> 6 1 2 1
#> 7 1 2 1
#> 8 1 2 1
#> 9 1 2 1
#> 10 1 2 1
#> 11 2 1 1
#> 12 2 1 1
#> 13 2 1 1
relabel_class(class, ref, return_map = FALSE)
#> [1] 2 2 2 2 2 2 2 2 2 2 1 1 1