Fowlkes-Mallows index

fmi.factor R Documentation

Description

The fmi()-function computes the Fowlkes-Mallows Index (FMI), a measure of the similarity between two sets of clusterings, between two vectors of predicted and observed factor() values.

Usage

## S3 method for class 'factor'
fmi(actual, predicted, ...)

## S3 method for class 'cmatrix'
fmi(x, ...)

fmi(...)

Arguments

actual

A vector of <factor>- of length \(n\), and \(k\) levels

predicted

A vector of <factor>-vector of length \(n\), and \(k\) levels

Arguments passed into other methods

x

A confusion matrix created cmatrix()

Value

A <numeric>-vector of length 1

Calculation

The metric is calculated for each class \(k\) as follows,

\[ \sqrt{\frac{\#TP_k}{\#TP_k + \#FP_k} \times \frac{\#TP_k}{\#TP_k + \#FN_k}} \]

Where \(\#TP_k\), \(\#FP_k\), and \(\#FN_k\) represent the number of true positives, false positives, and false negatives for each class \(k\), respectively.

Examples

# 1) recode Iris
# to binary classification
# problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

# 2) fit the logistic
# regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

# 3) generate predicted
# classes
predicted <- factor(
  as.numeric(
    predict(model, type = "response") >` 0.5
  ),
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 3.1) generate actual
# classes
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 4) evaluate model performance
# using Fowlkes Mallows Index
cat(
  "Fowlkes Mallows Index", fmi(
  actual    = actual,
  predicted = predicted
  ),
  sep = "\n"
)