log loss

logloss.factor R Documentation

Description

The logloss() function computes the Log Loss between observed classes (as a <factor>) and their predicted probability distributions (a <numeric> matrix). The weighted.logloss() function is the weighted version, applying observation-specific weights.

Usage

## S3 method for class 'factor'
logloss(actual, qk, normalize = TRUE, ...)

## S3 method for class 'factor'
weighted.logloss(actual, qk, w, normalize = TRUE, ...)

logloss(...)

weighted.logloss(...)

Arguments

actual

A vector of <factor>- of length \(n\), and \(k\) levels

qk

A \(n \times k\) <numeric>-matrix of predicted probabilities. The \(i\)-th row should sum to 1 (i.e., a valid probability distribution over the \(k\) classes). The first column corresponds to the first factor level in actual, the second column to the second factor level, and so on.

normalize

A <logical>-value (default: TRUE). If TRUE, the mean cross-entropy across all observations is returned; otherwise, the sum of cross-entropies is returned.

Arguments passed into other methods

w

A <numeric>-vector of length \(n\). NULL by default

Value

A <numeric>-vector of length 1

Calculation

\[H(p, qk) = -\sum_{i} \sum_{j} y_{ij} \log_2(qk_{ij})\]

where:

  • \(y_{ij}\) is the actual-values, where \(y_{ij}\) = 1 if the i-th sample belongs to class j, and 0 otherwise.

  • \(qk_{ij}\) is the estimated probability for the i-th sample belonging to class j.

Examples

# 1) Recode the iris data set to a binary classification problem
#    Here, the positive class ("Virginica") is coded as 1,
#    and the rest ("Others") is coded as 0.
iris$species_num <- as.numeric(iris$Species == "virginica")

# 2) Fit a logistic regression model predicting species_num from Sepal.Length &amp; Sepal.Width
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(link = "logit")
)

# 3) Generate predicted classes: "Virginica" vs. "Others"
predicted <- factor(
  as.numeric(predict(model, type = "response") >` 0.5),
  levels = c(1, 0),
  labels = c("Virginica", "Others")
)

# 3.1) Generate actual classes
actual <- factor(
  x      = iris$species_num,
  levels = c(1, 0),
  labels = c("Virginica", "Others")
)

# For Log Loss, we need predicted probabilities for each class.
# Since it's a binary model, we create a 2-column matrix:
#   1st column = P("Virginica")
#   2nd column = P("Others") = 1 - P("Virginica")
predicted_probs <- predict(model, type = "response")
qk_matrix <- cbind(predicted_probs, 1 - predicted_probs)

# 4) Evaluate unweighted Log Loss
#    'logloss' takes (actual, qk_matrix, normalize=TRUE/FALSE).
#    The factor 'actual' must have the positive class (Virginica) as its first level.
unweighted_LogLoss <- logloss(
  actual    = actual,           # factor
  qk        = qk_matrix,        # numeric matrix of probabilities
  normalize = TRUE              # normalize = TRUE
)

# 5) Evaluate weighted Log Loss
#    We introduce a weight vector, for example:
weights <- iris$Petal.Length / mean(iris$Petal.Length)
weighted_LogLoss <- weighted.logloss(
  actual    = actual,
  qk  = qk_matrix,
  w         = weights,
  normalize = TRUE
)

# 6) Print Results
cat(
  "Unweighted Log Loss:", unweighted_LogLoss,
  "Weighted Log Loss:", weighted_LogLoss,
  sep = "\n"
)