4 OpenMP

{SLmetrics} supports parallelization through OpenMP. In this section this functionality is introduced.

4.1 Enabling/Disabling OpenMP

OpenMP is disabled by default, but can be enabled as follows:

SLmetrics::openmp.on()

#> OpenMP enabled!

And disabled as follows:

SLmetrics::openmp.off()

#> OpenMP disabled!

By default all threads are used. To control the amount of threads, see the following code:

SLmetrics::openmp.threads(3)

#> Using 3 threads.

To use all threads:

SLmetrics::openmp.threads(NULL)

#> Using 3 threads.

4.2 Available threads

The number of available threads are detected automatically, but can also be viewed using SLmetrics::openmp.threads() without passing any arguments. See below:

SLmetrics::openmp.threads()

#> [1] 3

4.3 Benchmarking OpenMP

To benchmark the performance gain on enabling OpenMP, the same setup as in Chapter 3 is used. Below is the actual and predicted values are generated.

# 1) set seed for reproducibility
set.seed(1903)

# 2) create classification
# problem
fct_actual <- create_factor()
fct_predicted <- create_factor()

Code

SLmetrics::openmp.on()

benchmark(
    `With OpenMP` = SLmetrics::cmatrix(fct_actual, fct_predicted)
)

Table 4.1: Benchmark of computing a 3x3 confusion matrix with OpenMP enabled. Each benchmark is run 10 times with two input vectors of 10 million elements.

#> # A tibble: 1 × 4
#>   expression  execution_time memory_usage gc_calls
#>   <fct>             <bch:tm>    <bch:byt>    <dbl>
#> 1 With OpenMP         5.01ms           0B        0

Code

SLmetrics::openmp.off()

benchmark(
    `Wihtout OpenMP` = SLmetrics::cmatrix(fct_actual, fct_predicted)
)

Table 4.2: Benchmark of computing a 3x3 confusion matrix with OpenMP enabled. Each benchmark is run 10 times with two input vectors of 10 million elements.

#> # A tibble: 1 × 4
#>   expression     execution_time memory_usage gc_calls
#>   <fct>                <bch:tm>    <bch:byt>    <dbl>
#> 1 Wihtout OpenMP         8.54ms           0B        0

4.4 Key take-aways

Enabling OpenMP support can decrease computation time significantly - but it should only be used consciously, and with care, to avoid function calls competing for the same threads. This is especially the case if you are running a, say, neural network in parallel.