Classification functions

In this section all available classification metrics and related documentation is described. Common for all classifcation functions is that they use the method foo.factor or foo.cmatrix.

A primer on factors

Consider a classification problem with three classes: A, B, and C. The actual vector of factor values is defined as follows:

## set seed
set.seed(1903)

## actual
actual <- factor(
    x = sample(x = 1:3, size = 10, replace = TRUE),
    levels = c(1, 2, 3),
    labels = c("A", "B", "C")
)

## print values
print(actual)
#>  [1] B A B B A C B C C A
#> Levels: A B C

Here, the values 1, 2, and 3 are mapped to A, B, and C, respectively. Now, suppose your model does not predict any B’s. The predicted vector of factor values would be defined as follows:

## set seed
set.seed(1903)

## predicted
predicted <- factor(
    x = sample(x = c(1, 3), size = 10, replace = TRUE),
    levels = c(1, 2, 3),
    labels = c("A", "B", "C")
)

## print values
print(predicted)
#>  [1] C A C C C C C C A C
#> Levels: A B C

In both cases, \(k = 3\), determined indirectly by the levels argument.

Examples

In this section a brief introduction to the two methods are given.

factor method

## factor method
SLmetrics::accuracy(
  actual,
  predicted
)
#> [1] 0.3

cmatrix method

## 1) generate confusion
## matrix (cmatrix class)
confusion_matrix <- SLmetrics::cmatrix(
  actual,
  predicted
)

## 2) check class
class(confusion_matrix)
#> [1] "cmatrix"
## 3) summarise
summary(confusion_matrix)
#> Confusion Matrix (3 x 3) 
#> ================================================================================
#>   A B C
#> A 1 0 2
#> B 0 0 4
#> C 1 0 2
#> ================================================================================
#> Overall Statistics (micro average)
#>  - Accuracy:          0.30
#>  - Balanced Accuracy: 0.33
#>  - Sensitivity:       0.30
#>  - Specificity:       0.65
#>  - Precision:         0.30

The confusion_matrix can be passed into accuracy() as follows:

SLmetrics::accuracy(
  confusion_matrix
)
#> [1] 0.3

Using the cmatrix-method is more efficient if more than one classification metric is going to be calculated, as the metrics are calculated directly from the cmatrix-object, instead of looping though all the values in actual and predicted values for each metrics. See below:

cat(
  sep = "\n",
  paste("Accuracy:", SLmetrics::accuracy(
  confusion_matrix)),
  paste("Balanced Accuracy:", SLmetrics::baccuracy(
  confusion_matrix))
)
#> Accuracy: 0.3
#> Balanced Accuracy: 0.333333333333333