Easily Carry Out Latent Profile Analysis (LPA) Using Open-Source or Commercial Software

An interface to the 'mclust' package to easily carry out latent profile analysis ("LPA"). Provides functionality to estimate commonly-specified models. Follows a tidy approach, in that output is in the form of a data frame that can subsequently be computed on. Also has functions to interface to the commercial 'MPlus' software via the 'MplusAutomation' package.


Build Status CRAN status lifecycle DOI

Background

Latent Profile Analysis (LPA) is a statistical modeling approach for estimating distinct profiles, or groups, of variables. In the social sciences and in educational research, these profiles could represent, for example, how different youth experience dimensions of being engaged (i.e., cognitively, behaviorally, and affectively) at the same time.

tidyLPA provides the functionality to carry out LPA in R. In particular, tidyLPA provides functionality to specify different models that determine whether and how different parameters (i.e., means, variances, and covariances) are estimated and to specify (and compare solutions for) the number of profiles to estimate. The package is designed and documented to be easy to use, especially for beginners to LPA, but with fine-grained options available for estimating models and evaluating specific output as part of more complex analyses.

Installation

You can install tidyLPA from CRAN with:

install.packages("tidyLPA")

You can also install the development version of tidyLPA from GitHub with:

install.packages("devtools")
devtools::install_github("jrosen48/tidyLPA")

Example

Here is a brief example using the built-in pisaUSA15 data set and variables for broad interest, enjoyment, and self-efficacy. Note that we first type the name of the data frame, followed by the unquoted names of the variables used to create the profiles. We also specify the number of profiles and the model. See ?estimate_profiles for more details.

library(tidyLPA)
d <- pisaUSA15[1:100, ]
 
estimate_profiles(d, 
                  broad_interest, enjoyment, self_efficacy, 
                  n_profiles = 3)
#> LogLik is 283.991
#> BIC is 631.589
#> Entropy is 0.914
#> # A tibble: 94 x 5
#>    broad_interest enjoyment self_efficacy profile posterior_prob
#>             <dbl>     <dbl>         <dbl> <fct>            <dbl>
#>  1            3.8       4            1    1                1.000
#>  2            3         3            2.75 3                0.917
#>  3            1.8       2.8          3.38 3                0.997
#>  4            1.4       1            2.75 2                0.899
#>  5            1.8       2.2          2    3                0.997
#>  6            1.6       1.6          1.88 3                0.997
#>  7            3         3.8          2.25 1                0.927
#>  8            2.6       2.2          2    3                0.990
#>  9            1         2.8          2.62 3                0.998
#> 10            2.2       2            1.75 3                0.996
#> # ... with 84 more rows

The version of this function that uses MPlus is simple estimate_profiles_mplus() that is called in the same way (though some particular details can be changed with arguments specific to either estimate_profiles or to estimate_profiles_mplus()).

See the output is simply a data frame with the profile (and its posterior probability) and the variables used to create the profiles (this is the "tidy" part, in that the function takes and returns a data frame).

We can plot the profiles with by piping (using the %>% operator, loaded from the dplyr package) the output to plot_profiles().

library(dplyr, warn.conflicts = FALSE)
 
estimate_profiles(d, 
                  broad_interest, enjoyment, self_efficacy, 
                  n_profiles = 3) %>% 
    plot_profiles(to_center = TRUE)

Model specification

In addition to the number of profiles (specified with the n_profiles argument), the model can be specified in terms of whether and how the variable variances and covariances are estimated.

The models are specified by passing arguments to the variance and covariance arguments. The possible values for these arguments are:

  • variances: "equal" and "zero"
  • covariances: "varying", "equal", and "zero"

If no values are specified for these, then the equal variances and covariances fixed to 0 model is specified by default.

These arguments allow for four models to be specified:

  • Equal variances and covariances fixed to 0 (Model 1)
  • Varying variances and covariances fixed to 0 (Model 2)
  • Equal variances and equal covariances (Model 3)
  • Varying variances and varying covariances (Model 6)

Two additional models (Models 4 and 5) can be fit using functions that provide an interface to the MPlus software. More information on the models can be found in the vignette.

Here is an example of specifying a model with varying variances and covariances (Model 6; not run here):

estimate_profiles(d, 
                  broad_interest, enjoyment, self_efficacy, 
                  variances = "varying",
                  covariances = "varying",
                  n_profiles = 3)

Comparing a wide range of solutions

The function compare_solutions() estimates models with varying numbers of profiles and model specifications:

compare_solutions(d, broad_interest, enjoyment, self_efficacy)

The version that uses MPlus - compare_solutions_mplus() - is called in the same way; like for estimate_profiles() and estimate_profiles_mplus(), some particular details can be specified with arguments specific to compare_solutions() or compare_solutions_mplus().

More information

To learn more:

  • Browse the tidyLPA website (especially check out the Reference page to see more about other functions)

  • Read the Introduction to tidyLPA vignette, which has much more information on the models that can be specified with tidyLPA and on additional functionality

Contributing and Contact Information

One of the easiest but also most important ways to contribute is to post a question or to provide feedback. Both positive and negative feedback is welcome and helpful. You can get in touch by . . .

Contributions are also welcome via by making pull requests (PR), e.g. through this page on GitHub. It may be easier if you first file an issue outlining what you will do in the PR. You can also reach out via the methods described above.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

News

tidyLPA 0.2.1

  • minor changes and bug fixes
  • addition of new reference to JOSS paper

tidyLPA 0.2.0

Major breaking change:

  • change how models are specified; instead of using the model argument, whether and how the variances are covariances are estimated are passed the the variances and covariances argument; there are details in the readme and vignette and if a model argument is passed to a function, a message is returned describing how to specify the model using the variances and covariances arguments

Major change

  • change the compare_solutions_mplus() functions Mplus to still allow for the specification of six models, but to use the same four as compare_solutions() (which uses the mclust package, not MPlus) by default

Minor changes:

  • improve NAMESPACE documentation
  • add option to return original data frame for functions that use MPlus
  • add option to use missing data for functions that use MPlus
  • add [email protected] mailing list address to README as preferred contact method
  • remove deprecated function (to extract key statistics from an MPlus model)
  • make it so that a data frame with fit and other statistics is returned by default from compare_solutions_mplus()
  • added new values to the statistics returned by compare_solutions_mplus():
    • the cell size (the number of observations associated with each profile)
    • the number of times the log-likelihood was replicated, based on the number of optimization steps
    • Approximate Weight of Evidence (AWE) criterion
    • the number of parameters estimated
  • remove the messages about the software being in beta
  • how the Mplus syntax is generated was substantially changed/improved; thanks @gbiele

Bug fixes

  • change include_LMR argument to include VLMR
  • remove scale_fill_brewer("", type = "qual", palette = "Set3") so that solutions with larger numbers of profiles may be plotted
  • fix issue where lines longer than 90 characters (i.e., when there are many variables) cause an error

tidyLPA 0.1.3

  • improve plot_profiles() plots, including plotting bootstrapped standard when mclust output is directly used (thanks @cjvanlissa) & updated vignette with example of this
  • improve output from compare_solutions_mplus (thanks @DJAnderson07)
  • add function, extract_LL_mplus(), to extract log-likelihoods from models fit witsah estimate_profiles_mplus()
  • update documentation for pisaUSA15 dataset
  • improve compare_solutions_mplus() so it more reliably handles errors
  • improve vignette (thanks @oreojo for suggestion to mention that this package works best for continuous variables)
  • add URLs for package and bug reports to DESCRIPTION
  • add C.J. van Lissa and Daniel John Anderson as contributors

tidyLPA 0.1.2

Fix:

  • Specify version 0.7 of MplusAutomation in Imports to address error

Minor updates:

  • Update README and Vignette
  • Update function names to include MPlus
  • Export %>% from magrittr (so it does not need to be loaded from dplyr)
  • Correct name of title for vignette
  • Update function names
  • Change output of estimate_profiles_mplus() to be returned with return(), instead of with invisible()

tidyLPA 0.1.1

  • Added a NEWS.md file to track changes to the package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tidyLPA")

0.2.3 by Joshua M Rosenberg, 25 days ago


https://jrosen48.github.io/tidyLPA/


Report a bug at https://github.com/jrosen48/tidyLPA/issues


Browse source code at https://github.com/cran/tidyLPA


Authors: Joshua M Rosenberg [aut, cre] , Jennifer A Schmidt [ctb] , Patrick N Beymer [ctb] , Daniel Anderson [ctb] , Caspar van Lissa [aut] , Matthew J. Schell [ctb]


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports dplyr, forcats, ggplot2, magrittr, mclust, purrr, readr, rlang, stringr, tibble, tidyr

Suggests covr, devtools, knitr, MplusAutomation, parallel, rmarkdown, roxygen2, testthat


See at CRAN