false discovery rate

fdr.factor R Documentation

Description

The fdr()-function computes the false discovery rate (FDR), the proportion of false positives among the predicted positives, between two vectors of predicted and observed factor() values. The weighted.fdr() function computes the weighted false discovery rate.

Usage

## S3 method for class 'factor'
fdr(actual, predicted, micro = NULL, na.rm = TRUE, ...)

## S3 method for class 'factor'
weighted.fdr(actual, predicted, w, micro = NULL, na.rm = TRUE, ...)

## S3 method for class 'cmatrix'
fdr(x, micro = NULL, na.rm = TRUE, ...)

fdr(...)

weighted.fdr(...)

Arguments

actual

A vector of <factor>- of length \(n\), and \(k\) levels.

predicted

A vector of <factor>-vector of length \(n\), and \(k\) levels.

micro

A <logical>-value of length \(1\) (default: NULL). If TRUE it returns the micro average across all \(k\) classes, if FALSE it returns the macro average.

na.rm

A <logical> value of length \(1\) (default: TRUE). If TRUE, NA values are removed from the computation. This argument is only relevant when micro != NULL. When na.rm = TRUE, the computation corresponds to sum(c(1, 2, NA), na.rm = TRUE) / length(na.omit(c(1, 2, NA))). When na.rm = FALSE, the computation corresponds to sum(c(1, 2, NA), na.rm = TRUE) / length(c(1, 2, NA)).

Arguments passed into other methods

w

A <numeric>-vector of length \(n\). NULL by default.

x

A confusion matrix created cmatrix().

Value

If micro is NULL (the default), a named <numeric>-vector of length k

If micro is TRUE or FALSE, a <numeric>-vector of length 1

Calculation

The metric is calculated for each class \(k\) as follows,

\[ \frac{\#FP_k}{\#TP_k+\#FP_k} \]

Where \(\#TP_k\) and \(\#FP_k\) is the number of true psotives and false positives, respectively, for each class \(k\).

Examples

# 1) recode Iris
# to binary classification
# problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

# 2) fit the logistic
# regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

# 3) generate predicted
# classes
predicted <- factor(
  as.numeric(
    predict(model, type = "response") >` 0.5
  ),
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 3.1) generate actual
# classes
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 4) evaluate class-wise performance
# using False Discovery Rate

# 4.1) unweighted False Discovery Rate
fdr(
  actual    = actual,
  predicted = predicted
)

# 4.2) weighted False Discovery Rate
weighted.fdr(
  actual    = actual,
  predicted = predicted,
  w         = iris$Petal.Length/mean(iris$Petal.Length)
)

# 5) evaluate overall performance
# using micro-averaged False Discovery Rate
cat(
  "Micro-averaged False Discovery Rate", fdr(
    actual    = actual,
    predicted = predicted,
    micro     = TRUE
  ),
  "Micro-averaged False Discovery Rate (weighted)", weighted.fdr(
    actual    = actual,
    predicted = predicted,
    w         = iris$Petal.Length/mean(iris$Petal.Length),
    micro     = TRUE
  ),
  sep = "\n"
)