1 Introduction

Note

The disucssion in this section is academic, I have the outmost respect for all the developers, contributors and users of the {pkgs}. We are, afterall, united in our love for programming, data-science and R

There are currently three {pkgs} that are developed with machine leaning performance evaluation in mind: {MLmetrics}, {yardstick}, {mlr3measures}. These {pkgs} have historically bridged the gap between R and Python in terms of machine learning and data science.

1.1 The status-quo of {pkgs}

{MLmetrics} can be considered the legacy code when it comes to performance evaluation, and it served as a backend in {yardstick} up to version 0.0.2. It is built entirely on base R, and has been stable since its inception almost 10 years ago.

However, it appears that the development has reached it’s peak and is currently stale - see, for example, this stale PR related to this issue. Micro- and macro-averages have been implented in {scikit-learn} for many years, and {MLmetrics} simply didn’t keep up with the development.

{yardstick}, on the other hand, carried the torch forward and implemented these modern features. {yardstick} closely follows the syntax, naming and functionality of {scikit-learn} but is built with {tidyverse} tools; although the source code is nice to look at, it does introduce some serious overhead and carries the risk of deprecations.

Furthermore, it complicates a simple application by its verbose function naming, see for example metric()-function for <tbl> and metric_vec()-function for <numeric> - the output is the same, but the call is different. {yardstick} can’t handle more than one positive class at a time, so the end-user is forced to run the same function more than once to get performance metrics for the adjacent classes.

1.2 Why {SLmetrics}?

As the name suggests, {SLmetrics} closely resembles {MLmetrics} in it’s simplistic and low-level implementation of machine learning metrics. The resemblance ends there, however. {SLmetrics} are developed with three things in mind: speed, efficiency and scalability. And therefore addresses the shortcomings of the status-quo by construction - the {pkg} is built on C++ and {Rcpp} from the ground up. Furthermore, {SLmetrics} has a larger, and up-to-date, suite of performance metrics than {MLmetrics}

1.3 Key takeaways

The existing {pkgs} are outdated, inefficient and insufficient for modern large-scale machine learning applications. {SLmetrics} is, in the worst-case scenario, on par with low-level R implementations of equivalent metrics and is a multitude more memory-efficient than any of the {pkgs}. A detailed benchmark can be found in Chapter 3.