| Title: | Nested Sequential Monte Carlo for the Bayesian Mallows Model |
|---|---|
| Description: | Provides nested sequential Monte Carlo algorithms for performing sequential inference in the Bayesian Mallows model, which is a widely used probability model for rank and preference data. The package implements the SMC2 (Sequential Monte Carlo Squared) algorithm for handling sequentially arriving rankings and pairwise preferences, including support for complete rankings, partial rankings, and pairwise comparisons. The methods are based on Sorensen (2025) <doi:10.1214/25-BA1564>. |
| Authors: | Oystein Sorensen [aut, cre] (ORCID: <https://orcid.org/0000-0003-0724-3542>) |
| Maintainer: | Oystein Sorensen <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.0 |
| Built: | 2026-06-07 07:29:43 UTC |
| Source: | https://github.com/cran/BayesMallowsSMC2 |
Simulated Data
complete_rankingscomplete_rankings
complete_rankingsA data frame with 100 rows and 7 columns:
Timepoint
User id
Ranking given to the item.
Fits the Bayesian Mallows model to sequentially arriving ranking or preference data using the SMC2 (Sequential Monte Carlo Squared) algorithm. This function can handle complete rankings, partial rankings, and pairwise preference data, and supports mixture models with multiple clusters.
compute_sequentially( data, hyperparameters = set_hyperparameters(), smc_options = set_smc_options(), topological_sorts = NULL )compute_sequentially( data, hyperparameters = set_hyperparameters(), smc_options = set_smc_options(), topological_sorts = NULL )
data |
A dataframe containing partial rankings or pairwise preferences.
If
If data contains pairwise preferences, it must have the following structure:
|
hyperparameters |
A list returned from |
smc_options |
A list returned from |
topological_sorts |
A list returned from
|
The function implements the SMC2 algorithm for sequential Bayesian inference in the Mallows model. At each timepoint, it updates the particle approximation to the posterior distribution as new ranking or preference data arrives. The algorithm automatically performs resampling and rejuvenation steps when the effective sample size drops below the specified threshold.
The returned object has S3 methods for printing (print.BayesMallowsSMC2),
summarizing (summary.BayesMallowsSMC2), and plotting (plot.BayesMallowsSMC2).
For visualization of parameter evolution over time, see trace_plot().
An object of class BayesMallowsSMC2, which is a list containing:
A matrix of dispersion parameter values with dimensions
[n_clusters, n_particles]. Each column represents one particle, and
each row represents one cluster.
A 3-dimensional array of latent ranking values with dimensions
[n_items, n_clusters, n_particles]. Entry [i, k, j] gives the rank
of item i in cluster k for particle j.
A matrix of precision parameter values (for mixture models) with
dimensions [n_clusters, n_particles]. Each column represents one particle.
A 3-dimensional array of cluster assignment
probabilities (for mixture models) with dimensions
[n_particles, n_users, n_clusters]. Only present when n_clusters > 1.
A numeric vector of length n_particles containing
the normalized importance weights for each particle.
A numeric vector of length n_timepoints containing the effective
sample size at each timepoint.
An integer vector of length n_timepoints indicating whether
resampling occurred at each timepoint (1 = yes, 0 = no).
An integer vector of length n_timepoints showing
the number of particle filters used at each timepoint.
A numeric value giving the estimated log marginal likelihood of the data.
A list of parameter traces (only if trace = TRUE in
set_smc_options()). Each element contains alpha values at one timepoint.
A list of parameter traces (only if trace = TRUE in
set_smc_options()). Each element contains tau values at one timepoint.
A list of parameter traces (only if trace = TRUE in
set_smc_options()). Each element contains rho values at one timepoint.
A list of importance weight traces
(only if trace = TRUE in set_smc_options()).
A list of latent ranking traces (only if
trace_latent = TRUE in set_smc_options()).
Sørensen Ø, Stein A, Netto WL, Leslie DS (2025). “Sequential Rank and Preference Learning with the Bayesian Mallows Model.” Bayesian Analysis, 1(1), 1 – 26. doi:10.1214/25-BA1564.
# Compute the model sequentially with complete rankings mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Print the model summary mod # Plot posterior distribution of alpha plot(mod, parameter = "alpha")# Compute the model sequentially with complete rankings mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Print the model summary mod # Plot posterior distribution of alpha plot(mod, parameter = "alpha")
Simulated Two-Component Mixtures Data
mixturesmixtures
mixturesA data frame with 400 rows and 7 columns:
Timepoint
User id
Ranking given to the item.
Simulated Data with Pairwise Preferences
pairwise_preferencespairwise_preferences
pairwise_preferencesA data frame with 400 rows and 4 columns:
Timepoint
User id
Item preferred in the comparison.
Item disfavored in the comparison.
Simulated Data with Missing Values
partial_rankingspartial_rankings
partial_rankingsA data frame with 100 rows and 7 columns:
Timepoint
User id
Ranking given to the item.
Visualize posterior distributions of the parameters of the Bayesian Mallows model after the final timepoint.
## S3 method for class 'BayesMallowsSMC2' plot(x, parameter = "alpha", items = NULL, ...)## S3 method for class 'BayesMallowsSMC2' plot(x, parameter = "alpha", items = NULL, ...)
x |
An object of class |
parameter |
Character string defining the parameter to plot. Available
options are |
items |
The items to study in the plot for |
... |
Other arguments (currently unused). |
The function uses importance weights from the SMC algorithm to compute weighted posterior distributions. For mixture models (multiple clusters), separate plots are created for each cluster using faceting.
For parameter = "rho", if the number of items is large (> 5) and items
is not specified, a random subset of 5 items is selected for visualization.
The random selection respects the current random seed, so results are
reproducible if you set a seed before calling the function (e.g., with
set.seed()).
A ggplot object showing:
For parameter = "alpha": A histogram of the posterior distribution
of the dispersion parameter.
For parameter = "tau": A histogram of the posterior distribution
of the precision parameter.
For parameter = "rho": Bar charts showing the posterior probability
of each ranking for each item separately.
# Fit a model with complete rankings mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Plot alpha (default) plot(mod) # Plot tau plot(mod, parameter = "tau") # Plot rho for specific items plot(mod, parameter = "rho", items = c(1, 3, 5)) # Plot rho for all items (if <= 5 items) plot(mod, parameter = "rho")# Fit a model with complete rankings mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Plot alpha (default) plot(mod) # Plot tau plot(mod, parameter = "tau") # Plot rho for specific items plot(mod, parameter = "rho", items = c(1, 3, 5)) # Plot rho for all items (if <= 5 items) plot(mod, parameter = "rho")
This function precomputes all topological sorts for a given preference matrix. Topological sorts are consistent orderings of items that respect the given pairwise preference constraints.
precompute_topological_sorts(prefs, n_items, save_frac)precompute_topological_sorts(prefs, n_items, save_frac)
prefs |
A matrix representing the preference relations. This matrix must have two columns, the first of which represents the preferred item and the second of which represents the disfavored item. |
n_items |
An integer specifying the number of items to sort. |
save_frac |
Number between 0 and 1 specifying which fraction of sorts to save. |
The function generates all possible topological sorts for the provided preference matrix
and saves approximately save_frac of the sorts in a matrix which is returned.
A list with two elements:
An integer giving the total number of topological sorts.
A matrix where each column represents one topological sort.
The number of columns is approximately save_frac times sort_count.
If save_frac = 0, this is an empty matrix with dimensions c(0, 0).
# Extract preferences from user 1 in the included example data. prefs <- pairwise_preferences[ pairwise_preferences$user == 1, c("top_item", "bottom_item"), drop = FALSE] # Generate all topological sorts, but don't save them: sorts <- precompute_topological_sorts( prefs = as.matrix(prefs), n_items = 5, save_frac = 0 ) # Number of sorts sorts$sort_count # Empty matrix sorts$sort_matrix # Generate all topological sorts and save them: sorts <- precompute_topological_sorts( prefs = as.matrix(prefs), n_items = 5, save_frac = 1 ) # Number of sorts sorts$sort_count # Matrix with all of them sorts$sort_matrix# Extract preferences from user 1 in the included example data. prefs <- pairwise_preferences[ pairwise_preferences$user == 1, c("top_item", "bottom_item"), drop = FALSE] # Generate all topological sorts, but don't save them: sorts <- precompute_topological_sorts( prefs = as.matrix(prefs), n_items = 5, save_frac = 0 ) # Number of sorts sorts$sort_count # Empty matrix sorts$sort_matrix # Generate all topological sorts and save them: sorts <- precompute_topological_sorts( prefs = as.matrix(prefs), n_items = 5, save_frac = 1 ) # Number of sorts sorts$sort_count # Matrix with all of them sorts$sort_matrix
Prints a summary of a BayesMallowsSMC2 object returned by
compute_sequentially().
## S3 method for class 'BayesMallowsSMC2' print(x, ...)## S3 method for class 'BayesMallowsSMC2' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
The print method displays key information about the fitted Bayesian Mallows model, including:
Number of particles
Number of timepoints
Number of items
Number of clusters
Log marginal likelihood
Final effective sample size (ESS)
Number of resampling events
Invisibly returns the input object x.
# Fit a model with complete rankings set.seed(123) mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Print the model print(mod) # or simply mod# Fit a model with complete rankings set.seed(123) mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Print the model print(mod) # or simply mod
Prints a summary of a BayesMallowsSMC2 model.
## S3 method for class 'summary.BayesMallowsSMC2' print(x, ...)## S3 method for class 'summary.BayesMallowsSMC2' print(x, ...)
x |
An object of class |
... |
Additional arguments (currently unused). |
Invisibly returns the input object x.
Set the hyperparameters for the Bayesian Mallows model. This function
creates a list of hyperparameter values that can be passed to
compute_sequentially().
set_hyperparameters( n_items, alpha_shape = 1, alpha_rate = 0.5, cluster_concentration = 10, n_clusters = 1 )set_hyperparameters( n_items, alpha_shape = 1, alpha_rate = 0.5, cluster_concentration = 10, n_clusters = 1 )
n_items |
Integer defining the number of items. |
alpha_shape |
Shape parameter of the gamma prior distribution for the scale parameter alpha. Defaults to 1. |
alpha_rate |
Rate parameter of the gamma prior distribution for the scale parameter alpha. Defaults to 0.5. |
cluster_concentration |
Concentration parameter of the Dirichlet
distribution for cluster probabilities. Only used when |
n_clusters |
Integer defining the number of clusters. Defaults to 1. |
A list with components n_items, alpha_shape, alpha_rate,
cluster_concentration, and n_clusters.
# Example: Set hyperparameters and use them with partial rankings # Set hyperparameters with default values hyperparams1 <- set_hyperparameters(n_items = 5) # Set hyperparameters with custom prior for alpha # A larger alpha_shape and smaller alpha_rate increases the prior mean hyperparams2 <- set_hyperparameters( n_items = 5, alpha_shape = 2, alpha_rate = 1 ) # Use the hyperparameters with compute_sequentially # This example uses partial rankings with a small number of particles # for fast execution suitable for CRAN checks set.seed(123) mod <- compute_sequentially( partial_rankings, hyperparameters = hyperparams2, smc_options = set_smc_options( n_particles = 20, n_particle_filters = 4, max_rejuvenation_steps = 3 ) )# Example: Set hyperparameters and use them with partial rankings # Set hyperparameters with default values hyperparams1 <- set_hyperparameters(n_items = 5) # Set hyperparameters with custom prior for alpha # A larger alpha_shape and smaller alpha_rate increases the prior mean hyperparams2 <- set_hyperparameters( n_items = 5, alpha_shape = 2, alpha_rate = 1 ) # Use the hyperparameters with compute_sequentially # This example uses partial rankings with a small number of particles # for fast execution suitable for CRAN checks set.seed(123) mod <- compute_sequentially( partial_rankings, hyperparameters = hyperparams2, smc_options = set_smc_options( n_particles = 20, n_particle_filters = 4, max_rejuvenation_steps = 3 ) )
Configure the SMC2 (Sequential Monte Carlo Squared) algorithm for the Bayesian Mallows model. This function sets parameters that control the particle filter, resampling strategy, and diagnostic output.
set_smc_options( n_particles = 1000, n_particle_filters = 50, max_particle_filters = 10000, resampling_threshold = n_particles/2, doubling_threshold = 0.2, max_rejuvenation_steps = 20, metric = "footrule", resampler = "multinomial", latent_rank_proposal = "uniform", verbose = FALSE, trace = FALSE, trace_latent = FALSE, use_backward_simulation = FALSE )set_smc_options( n_particles = 1000, n_particle_filters = 50, max_particle_filters = 10000, resampling_threshold = n_particles/2, doubling_threshold = 0.2, max_rejuvenation_steps = 20, metric = "footrule", resampler = "multinomial", latent_rank_proposal = "uniform", verbose = FALSE, trace = FALSE, trace_latent = FALSE, use_backward_simulation = FALSE )
n_particles |
Integer specifying the number of particles to use in the outer SMC loop. More particles generally improve approximation accuracy but increase computational cost. Defaults to 1000. |
n_particle_filters |
Integer specifying the initial number of particle filters for each particle in the inner loop. This controls the granularity of the latent rank estimation. Defaults to 50. |
max_particle_filters |
Integer specifying the maximum number of particle filters allowed. The algorithm can adaptively increase the number of filters up to this limit when the acceptance rate is low. Defaults to 10000. |
resampling_threshold |
Numeric specifying the effective sample size
threshold for triggering resampling. When the effective sample size falls
below this threshold, the particles are resampled to avoid degeneracy.
Defaults to |
doubling_threshold |
Numeric threshold for particle filter doubling. If
the acceptance rate of the rejuvenation step falls below this threshold,
the number of particle filters is doubled (up to |
max_rejuvenation_steps |
Integer specifying the maximum number of rejuvenation MCMC steps to perform. The rejuvenation step helps maintain particle diversity. The algorithm stops early if the number of unique particles exceeds half the total number of particles. Defaults to 20. |
metric |
Character string specifying the distance metric to use for
comparing rankings. Options are |
resampler |
Character string specifying the resampling algorithm.
Options are |
latent_rank_proposal |
Character string specifying the proposal
distribution for latent ranks in the Metropolis-Hastings step. Options are
|
verbose |
Logical indicating whether to print progress messages during
computation. Defaults to |
trace |
Logical specifying whether to save static parameters (alpha,
rho, cluster probabilities) at each timestep. This is useful for
diagnostic purposes but increases memory usage. Defaults to |
trace_latent |
Logical specifying whether to sample and save one
complete set of latent rankings for each particle at each timepoint. This
can be used to inspect the evolution of rankings over time but
substantially increases memory usage. Defaults to |
use_backward_simulation |
Logical specifying whether to use
Particle Gibbs with Backward Simulation (PG-BSi) during the rejuvenation
step. Defaults to |
The SMC2 algorithm uses a nested particle filter structure:
The outer loop maintains n_particles particles, each representing
a hypothesis about the static parameters (alpha, rho, cluster
probabilities).
The inner loop uses n_particle_filters particle filters per outer
particle to estimate the latent rankings.
When new data arrives, the algorithm updates the particle weights and performs rejuvenation MCMC steps to maintain diversity.
If particle degeneracy occurs (effective sample size below
resampling_threshold), particles are resampled.
If the acceptance rate during rejuvenation is low (below
doubling_threshold), the number of particle filters is adaptively
doubled.
For computational efficiency with CRAN examples, use smaller values such as
n_particles = 100 and n_particle_filters = 1.
A list containing all the specified options, suitable for passing to
compute_sequentially().
compute_sequentially(), set_hyperparameters()
# Basic usage with default settings opts <- set_smc_options() # Customize for faster computation (suitable for CRAN examples) opts_fast <- set_smc_options( n_particles = 100, n_particle_filters = 1 ) # Use with complete rankings data mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Customize resampling and metric opts_custom <- set_smc_options( n_particles = 200, n_particle_filters = 10, resampler = "stratified", metric = "kendall" ) # Enable diagnostic output opts_trace <- set_smc_options( n_particles = 100, n_particle_filters = 1, verbose = TRUE, trace = TRUE ) # For partial rankings with more rejuvenation steps mod_partial <- compute_sequentially( partial_rankings[1:10, ], hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options( n_particles = 30, n_particle_filters = 5, max_rejuvenation_steps = 10, latent_rank_proposal = "uniform" ) )# Basic usage with default settings opts <- set_smc_options() # Customize for faster computation (suitable for CRAN examples) opts_fast <- set_smc_options( n_particles = 100, n_particle_filters = 1 ) # Use with complete rankings data mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Customize resampling and metric opts_custom <- set_smc_options( n_particles = 200, n_particle_filters = 10, resampler = "stratified", metric = "kendall" ) # Enable diagnostic output opts_trace <- set_smc_options( n_particles = 100, n_particle_filters = 1, verbose = TRUE, trace = TRUE ) # For partial rankings with more rejuvenation steps mod_partial <- compute_sequentially( partial_rankings[1:10, ], hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options( n_particles = 30, n_particle_filters = 5, max_rejuvenation_steps = 10, latent_rank_proposal = "uniform" ) )
Creates a summary of a BayesMallowsSMC2 object returned by
compute_sequentially().
## S3 method for class 'BayesMallowsSMC2' summary(object, ...)## S3 method for class 'BayesMallowsSMC2' summary(object, ...)
object |
An object of class |
... |
Additional arguments (currently unused). |
The summary method creates a summary object that includes:
Number of particles
Number of timepoints
Number of items
Number of clusters
Log marginal likelihood
Final effective sample size (ESS)
Number of resampling events
Posterior mean of alpha for each cluster
Posterior standard deviation of alpha for each cluster
An object of class summary.BayesMallowsSMC2, which is a list
containing summary information about the model.
# Fit a model with complete rankings set.seed(123) mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Create summary summary(mod)# Fit a model with complete rankings set.seed(123) mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options(n_particles = 100, n_particle_filters = 1) ) # Create summary summary(mod)
Visualize the timeseries dynamics of the alpha and tau parameters across timepoints. This function creates trace plots similar to Figure 4 (left) in Sørensen Ø, Stein A, Netto WL, Leslie DS (2025). “Sequential Rank and Preference Learning with the Bayesian Mallows Model.” Bayesian Analysis, 1(1), 1 – 26. doi:10.1214/25-BA1564..
trace_plot(x, parameter = "alpha", ...)trace_plot(x, parameter = "alpha", ...)
x |
An object of class |
parameter |
Character string defining the parameter to plot. Available
options are |
... |
Other arguments (currently unused). |
This function requires that the model was fitted with trace = TRUE in the
smc_options. The trace contains the parameter values at each timepoint,
which allows visualization of how the posterior distribution evolves as more
data arrives sequentially.
For mixture models (multiple clusters), separate trace plots are created for each cluster using faceting.
The shaded area represents the 95% credible interval (from 2.5% to 97.5% quantiles) of the posterior distribution at each timepoint, computed using the importance weights from the SMC algorithm.
A ggplot object showing the evolution of the parameter over time. For each timepoint, the plot shows:
The weighted mean (solid line)
The weighted 0.025 and 0.975 quantiles (shaded area representing the 95% credible interval)
Sørensen Ø, Stein A, Netto WL, Leslie DS (2025). “Sequential Rank and Preference Learning with the Bayesian Mallows Model.” Bayesian Analysis, 1(1), 1 – 26. doi:10.1214/25-BA1564.
# Fit a model with trace enabled mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options( n_particles = 100, n_particle_filters = 1, trace = TRUE ) ) # Create trace plot for alpha (default) trace_plot(mod)# Fit a model with trace enabled mod <- compute_sequentially( complete_rankings, hyperparameters = set_hyperparameters(n_items = 5), smc_options = set_smc_options( n_particles = 100, n_particle_filters = 1, trace = TRUE ) ) # Create trace plot for alpha (default) trace_plot(mod)