Memoisation of Component Allocation Draws • bmoe

Purpose

Explains the reason for memoisation within key functionality and nuance that should be understood by the users of this package.

Motivation

Suppose we fit a Bayesian Mixture of Experts analysis.

object <- example_bmoe_fit(multiple_y = TRUE)
new_data <- object$new_data[1:4, ]

And with the new_data, defined above, we want to

calculate the OOS log likelihood per observation;
make predictions based on the predictors in the new data.

The functionality to achieve these objects are isolated in

bmoe::calculate_pointwise_log_lik();
bmoe:::predict.bmoe_fit() (callable by stats::predict).

Both of these functions use

z <- calculate_component_samples(object, new_data)

to draw new samples from the posterior distribution.

While the distribution will be the same when all inputs are the same, the samples may be slightly different due to the randomness of the quantity.

Hence, we memoise the function to extract samples so the draws denoting the component allocations remain unchanged unless the object or new_data are changed.

To see this, consider the following cleaned output from the R console.

memoise::is.memoised(calculate_component_samples)
#> [1] TRUE


system.time({
  z <- calculate_component_samples(object, new_data)
  print(z)
})
#> [MESSAGE]: Drawing new allocation samples from relevant distribution
#>
#> Output summarised over 10000 iterations and 2 chains:
#> 
#> varname = 'z'
#> 
#> [1] 2.08050 2.00210 2.17815 1.83230
#>    user  system elapsed 
#>    5.59    0.16    5.73


system.time({
  z <- calculate_component_samples(object, new_data)
  print(z)
})
#> Output summarised over 10000 iterations and 2 chains:
#> 
#> varname = 'z'
#> 
#> [1] 2.08050 2.00210 2.17815 1.83230
#>    user  system elapsed 
#>    0.02    0.00    0.02


system.time({
  z <- calculate_component_samples(object, new_data[1:2, ])
  print(z)
})
#> [MESSAGE]: Drawing new allocation samples from relevant distribution
#>
#> Output summarised over 10000 iterations and 2 chains:
#>
#> varname = 'z'
#> 
#> [1] 2.0798 1.9923
#>    user  system elapsed 
#>    5.02    0.12    5.13

We observe that the second attempt to draw z from the new data is returned almost immediately, with the same values. Whereas modifying the inputs forces the cache to be invalid and new draws are produced.

Conclusion

Using the same object and new_data in the pointwise log likelihood function and prediction function will use the same component allocation samples.

Note that this functionality will not persist across R sessions and so a message is produced when calculate_component_samples samples new draws from the posterior.