Functions to prepare rankings data and fit the Plackett-Luce model
jointly attributed to Plackett (1975)
Package website: https://hturner.github.io/PlackettLuce/.
The PlackettLuce package implements a generalization of the model jointly attributed to Plackett (1975) and Luce (1959) for modelling rankings data. Examples of rankings data might be the finishing order of competitors in a race, or the preference of consumers over a set of competing products.
The output of the model is an estimated worth for each item that appears in the rankings. The parameters are generally presented on the log scale for inference.
The implementation of the Plackett-Luce model in PlackettLuce:
In addition the package provides methods for
The package may be installed from CRAN via
The development version can be installed via
The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this they released ratings about movies from the users of the system that have been transformed to preference data and are available from PrefLib. Each data set comprises rankings of a set of 3 or 4 movies selected at random. Here we consider rankings for just one set of movies to illustrate the functionality of PlackettLuce.
The data can be read in using the
read.soc function in
library(PlackettLuce)preflib <- ""netflix <- read.soc(file.path(preflib, "netflix/ED-00004-00000138.soc"))head(netflix, 2)
## n Rank 1 Rank 2 Rank 3 Rank 4 ## 1 68 2 1 4 3 ## 2 53 1 2 4 3
Each row corresponds to a unique ordering of the four movies in this data set. The number of Netflix users that assigned that ordering is given in the first column, followed by the four movies in preference order. So for example, 68 users ranked movie 2 first, followed by movie 1, then movie 4 and finally movie 3.
PlackettLuce, the model-fitting function in PlackettLuce requires
that the data are provided in the form of rankings rather than
orderings, i.e. the rankings are expressed by giving the rank for each
item, rather than ordering the items. We can create a
object from a set of orderings as follows
R <- as.rankings(netflix[,-1], input = "ordering")colnames(R) <- attr(netflix, "item")R[1:3, as.rankings = FALSE]
## Mean Girls Beverly Hills Cop The Mummy Returns Mission: Impossible II ## 1 2 1 4 3 ## 2 1 2 4 3 ## 3 2 1 3 4
read.soc saved the names of the movies in the
netflix, so we have used these to label the items.
Subsetting the rankings object
as.rankings = FALSE, returns
the underlying matrix of rankings corresponding to the subset. So for
example, in the first ranking the second movie (Beverly Hills Cop) is
ranked number 1, followed by the first movie (Mean Girls) with rank 2,
followed by the fourth movie (Mission: Impossible II) and finally the
third movie (The Mummy Returns), giving the same ordering as in the
Various methods are provided for
"rankings" objects, in particular if
we subset the rankings without
as.rankings = FALSE, the result is
"rankings" object and the corresponding print method is used:
## 1 ## "Beverly Hills Cop > Mean Girls > Mis ..." ## 2 ## "Mean Girls > Beverly Hills Cop > Mis ..." ## 3 ## "Beverly Hills Cop > Mean Girls > The ..."
print(R[1:3], width = 60)
## 1 ## "Beverly Hills Cop > Mean Girls > Mission: Impossible II ..." ## 2 ## "Mean Girls > Beverly Hills Cop > Mission: Impossible II ..." ## 3 ## "Beverly Hills Cop > Mean Girls > The Mummy Returns > Mis ..."
The rankings can now be passed to
PlackettLuce to fit the
Plackett-Luce model. The counts of each ranking provided in the
downloaded data are used as weights when fitting the model.
mod <- PlackettLuce(R, weights = netflix$n)coef(mod, log = FALSE)
## Mean Girls Beverly Hills Cop The Mummy Returns ## 0.2306285 0.4510655 0.1684719 ## Mission: Impossible II ## 0.1498342
log = FALSE gives the worth parameters,
constrained to sum to one. These parameters represent the probability
that each movie is ranked first.
For inference these parameters are converted to the log scale, by default setting the first parameter to zero so that the standard errors are estimable:
## Call: PlackettLuce(rankings = R, weights = netflix$n) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## Mean Girls 0.00000 NA NA NA ## Beverly Hills Cop 0.67080 0.06099 10.999 < 2e-16 *** ## The Mummy Returns -0.31404 0.06465 -4.857 1.19e-06 *** ## Mission: Impossible II -0.43128 0.06508 -6.627 3.42e-11 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual deviance: 3493.5 on 3525 degrees of freedom ## AIC: 3499.5 ## Number of iterations: 5
In this way, Mean Girls is treated as the reference movie, the positive parameter for Beverly Hills Cop shows this was more popular among the users, while the negative parameters for the other two movies show these were less popular.
Comparisons between different pairs of movies can be made visually by plotting the log-worth parameters with comparison intervals based on quasi standard errors.
qv <- qvcalc(mod)plot(qv, ylab = "Worth (log)", main = NULL)
If the intervals overlap there is no significant difference. So we can see that Beverly Hills Cop is significantly more popular than the other three movies, Mean Girls is significant more popular than The Mummy Returns or Mission: Impossible II, but there was no significant difference in users’ preference for these last two movies.
The full functionality of PlackettLuce is illustrated in the package vignette, along with details of the model used in the package and a comparison to other packages. The vignette can be found on the package website or from within R once the package has been installed, e.g. via
vignette("Overview", package = "PlackettLuce")
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Luce, R. Duncan. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
Plackett, Robert L. 1975. “The Analysis of Permutations.” Appl. Statist 24 (2):193–202. https://doi.org/10.2307/2346567.
"summary.PlacketLuce"objects now respect
nwhich is now weighted count of rankings (previously only returned unweighted count with argument
aggregate = TRUE).
AIC.pltreeto work on
"pltree"object with one node.
AIC.pltreeto enable computation of AIC on new observations (e.g. data held out in cross-validation).
fitted.pltreeto return combined fitted probabilities for each choice within each ranking, for each node in a Plackett-Luce tree.
vcov.PlackettLucenow works for models with non-integer weights (fixes #25).
plot.pltreenow works for
worth = TRUEwith psychotree version 0.15-2 (currently pre-release on https://r-forge.r-project.org/R/?group_id=330)
plfitnow work when
startargument is set.
itempar.PlackettLucenow works with
alias = FALSE
plot.PlackettLucemethod so that plotting works for a saved
beansdata (which has been updated).
package?PlackettLuce. (Fixes #14 and #21).
maxitdefaults to 500 in
log = TRUEargument (fixes #19).
[.grouped_rankings]now works for replicated indices.
pltree()function for use with
partykit::mob(). Requires new objects of type
"grouped_rankings"that add a grouping index to a
"rankings"object and store other derived objects used by
PlackettLuce. Methods to print, plot and predict from Plackett-Luce tree are provided.
connectivity()function to check connectivity of a network given adjacency matrix. New
adjacency()function computes adjacency matrix without creating edgelist, so remove
as.edgelistgeneric and method for `"PlackettLuce" objects.
as.data.framemethods so that rankings and grouped rankings can be added to model frames.
formatmethods for rankings and grouped_rankings, for pretty printing.
[methods for rankings and grouped_rankings, to create valid rankings from selected rankings and/or items.
itemparmethod for "PlackettLuce" objects to obtain different parameterizations of the worth parameters.
read.socfunction to read Strict Orders - Complete List (.soc) files from http://www.preflib.org.
Old behaviour should be reproducible with arguments
npseudo = 0, steffensen = 0, start = c(rep(1/N, N), rep(0.1, D))
N is number of items and
D is maximum order of ties.
PlackettLuce; should be specified instead when calling
qvcalcgeneric now imported from qvcalc
coefso that worth parameters (probability of coming first in strict ranking of all items) can be obtained easily.