1 Introduction
The disucssion in this section is academic, I have the outmost respect for all the developers, contributors and users of the {pkgs}. We are, afterall, united in our love for programming, data-science and R
There are currently three {pkgs} that are developed with machine leaning performance evaluation in mind: {MLmetrics}, {yardstick}, {mlr3measures}. These {pkgs} have historically bridged the gap between R
and Python
in terms of machine learning and data science.
1.1 The status-quo of {pkgs}
{MLmetrics} can be considered the legacy code when it comes to performance evaluation, and it served as a backend in {yardstick} up to version 0.0.2. It is built entirely on base R, and has been stable since its inception almost 10 years ago.
However, it appears that the development has reached it’s peak and is currently stale - see, for example, this stale PR related to this issue. Micro- and macro-averages have been implented in {scikit-learn} for many years, and {MLmetrics} simply didn’t keep up with the development.
{yardstick}, on the other hand, carried the torch forward and implemented these modern features. {yardstick} closely follows the syntax, naming and functionality of {scikit-learn} but is built with {tidyverse} tools; although the source code is nice to look at, it does introduce some serious overhead and carries the risk of deprecations.
Furthermore, it complicates a simple application by its verbose function naming, see for example metric()
-function for <tbl>
and metric_vec()
-function for <numeric>
- the output is the same, but the call is different. {yardstick} can’t handle more than one positive class at a time, so the end-user is forced to run the same function more than once to get performance metrics for the adjacent classes.
1.1.1 Summary
In short, the existing {pkgs} are outdated, inefficient and insufficient for modern large-scale machine learning applications.
1.2 Why {SLmetrics}?
As the name suggests, {SLmetrics} closely resembles {MLmetrics} in it’s simplistic and low-level implementation of machine learning metrics. The resemblance ends there, however.
{SLmetrics} are developed with three things in mind: speed, efficiency and scalability. And therefore addresses the shortcomings of the status-quo by construction - the {pkg} is built on c++
and {Rcpp} from the ground up. See Table 1.1 where
Code
set.seed(1903)
<- rnorm(1e7)
actual <- actual + rnorm(1e7)
predicted
::mark(
bench`{SLmetrics}` = SLmetrics::rmse(actual, predicted),
`{MLmetrics}` = MLmetrics::RMSE(predicted, actual),
iterations = 100
)
This shows that well-written R
-code is hard to beat speed-wise. {MLmetrics} is roughly 20% faster - but uses 30,000 times more memory. How about constructing a confusion matrix
Code
set.seed(1903)
<- factor(sample(letters[1:3], size = 1e7, replace = TRUE))
actual <- factor(sample(letters[1:3], size = 1e7, replace = TRUE))
predicted
::mark(
bench`{SLmetrics}` = SLmetrics::cmatrix(actual, predicted),
`{MLmetrics}` = MLmetrics::ConfusionMatrix(actual, predicted),
check = FALSE,
iterations = 100
)
{SLmetrics} uses 1/50th of the time {MLmetrics} and the memory usage is equivalent as the previous example but uses significantly less memory than {MLmetrics}.
1.2.1 Summary
{SLmetrics} is, in the worst-case scenario, on par with low-level R
implementations of equivalent metrics and is a multitude more memory-efficient than any of the {pkgs}. A detailed benchmark can be found here.