Area under the Receiver Operator Characteristics Curve

roc.auc.matrix R Documentation

Description

A generic function for the area under the Receiver Operator Characteristics Curve. Use weighted.roc.auc() for the weighted area under the Receiver Operator Characteristics Curve.

Usage

## S3 method for class 'matrix'
roc.auc(actual, response, micro = NULL, method = 0L, ...)

## S3 method for class 'matrix'
weighted.roc.auc(actual, response, w, micro = NULL, method = 0L, ...)

## Generic S3 method
roc.auc(
 actual,
 response,
 micro  = NULL,
 method = 0,
 ...
)

## Generic S3 method
weighted.roc.auc(
 actual,
 response,
 w,
 micro  = NULL,
 method = 0,
 ...
)

Arguments

actual

A vector of values of length \(n\), and \(k\) levels.

response

A \(n \times k\) <numeric>-matrix. The estimated response probabilities for each class \(k\).

micro

A -value of length \(1\) (default: NULL). If TRUE it returns the micro average across all \(k\) classes, if FALSE it returns the macro average.

method

A <numeric> value (default: \(0\)). Defines the underlying method of calculating the area under the curve. If \(0\) it is calculated using the trapezoid-method, if \(1\) it is calculated using the step-method.

Arguments passed into other methods.

w

A <numeric>-vector of length \(n\). NULL by default.

Value

A <numeric> vector of length 1

Definition

Trapezoidal rule

The trapezoidal rule approximates the integral of a function \(f(x)\) between \(x = a\) and \(x = b\) using trapezoids formed between consecutive points. If we have points \(x_0, x_1, \ldots, x_n\) (with \(a = x_0 < x_1 < \cdots < x_n = b\)) and corresponding function values \(f(x_0), f(x_1), \ldots, f(x_n)\), the area under the curve \(A_T\) is approximated by:

\[ A_T \approx \sum_{k=1}^{n} \frac{f(x_{k-1}) + f(x_k)}{2} \bigl[x_k - x_{k-1}\bigr]. \]

Step-function method

The step-function (rectangular) method uses the value of the function at one endpoint of each subinterval to form rectangles. With the same partition \(x_0, x_1, \ldots, x_n\), the rectangular approximation \(A_S\) can be written as:

\[ A_S \approx \sum_{k=1}^{n} f(x_{k-1}) \bigl[x_k - x_{k-1}\bigr]. \]

Examples

# 1) recode Iris
# to binary classification
# problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

# 2) fit the logistic
# regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

# 3) generate predicted
# classes
response <- predict(model, type = "response")

# 3.1) generate actual
# classes
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 4) generate receiver operator characteristics
# data

# 4.1) calculate residual
# probability and store as matrix
response <- matrix(
  data = cbind(response, 1-response),
  nrow = length(actual)
)

# 4.2) calculate class-wise
# area under the curve
roc.auc(
  actual   = actual,
  response = response 
)

# 4.3) calculate class-wise
# weighted area under the curve
weighted.roc.auc(
  actual   = actual,
  response = response,
  w        = iris$Petal.Length/mean(iris$Petal.Length)
)

# 5) evaluate overall area under
# the curve
cat(
  "Micro-averaged area under the ROC curve", roc.auc(
    actual    = actual,
    response  = response,
    micro     = TRUE
  ),
  "Micro-averaged area under the ROC curve (weighted)", weighted.roc.auc(
    actual    = actual,
    response  = response,
    w         = iris$Petal.Length/mean(iris$Petal.Length),
    micro     = TRUE
  ),
  sep = "\n"
)