Tools for Building OLS Regression Models

Tools for building OLS regression models . Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.


olsrr: Tools for building OLS Regression models

Author: Aravind Hebbali
License: MIT

CRAN_Status_Badge Travis-CI Build Status AppVeyor Build Status

Overview

The olsrr package provides following tools for teaching and learning OLS regression using R:

  • Comprehensive Regression Output
  • Variable Selection Procedures
  • Heteroskedasticity Tests
  • Collinearity Diagnostics
  • Model Fit Assessment
  • Measures of Influence
  • Residual Diagnostics
  • Variable Contribution Assessment

Installation

You can install olsrr from github with:

install.packages("olsrr")
 
# the development version from github
# install.packages("devtools")
devtools::install_github("rsquaredacademy/olsrr")

Shiny App

Use ols_launch_app() to explore the package using a shiny app.

Vignettes

Consistent Prefix

olsrr uses consistent prefix ols_ for easy tab completion.

Quick Demo

olsrr is built with the aim of helping those users who are new to the R language. If you know how to write a formula or build models using lm, you will find olsrr very useful. Most of the functions use an object of class lm as input. So you just need to build a model using lm and then pass it onto the functions in olsrr. Below is a quick demo:

Regression
ols_regress(mpg ~ disp + hp + wt + qsec, data = mtcars)
#>                         Model Summary                          
#> --------------------------------------------------------------
#> R                       0.914       RMSE                2.622 
#> R-Squared               0.835       Coef. Var          13.051 
#> Adj. R-Squared          0.811       MSE                 6.875 
#> Pred R-Squared          0.771       MAE                 1.858 
#> --------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#> 
#>                                ANOVA                                 
#> --------------------------------------------------------------------
#>                 Sum of                                              
#>                Squares        DF    Mean Square      F         Sig. 
#> --------------------------------------------------------------------
#> Regression     940.412         4        235.103    34.195    0.0000 
#> Residual       185.635        27          6.875                     
#> Total         1126.047        31                                    
#> --------------------------------------------------------------------
#> 
#>                                   Parameter Estimates                                    
#> ----------------------------------------------------------------------------------------
#>       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
#> ----------------------------------------------------------------------------------------
#> (Intercept)    27.330         8.639                  3.164    0.004     9.604    45.055 
#>        disp     0.003         0.011        0.055     0.248    0.806    -0.019     0.025 
#>          hp    -0.019         0.016       -0.212    -1.196    0.242    -0.051     0.013 
#>          wt    -4.609         1.266       -0.748    -3.641    0.001    -7.206    -2.012 
#>        qsec     0.544         0.466        0.161     1.166    0.254    -0.413     1.501 
#> ----------------------------------------------------------------------------------------
Breusch Pagan Test

Breusch Pagan test is used to test for herteroskedasticity (non-constant error variance). It tests whether the variance of the errors from a regression is dependent on the values of the independent variables. It is a χ2 test.

model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
ols_bp_test(model)
#> 
#>  Breusch Pagan Test for Heteroskedasticity
#>  -----------------------------------------
#>  Ho: the variance is constant            
#>  Ha: the variance is not constant        
#> 
#>              Data               
#>  -------------------------------
#>  Response : mpg 
#>  Variables: fitted values of mpg 
#> 
#>        Test Summary         
#>  ---------------------------
#>  DF            =    1 
#>  Chi2          =    1.429672 
#>  Prob > Chi2   =    0.231818
Collinearity Diagnostics
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_coll_diag(model)
#> Tolerance and Variance Inflation Factor
#> ---------------------------------------
#> # A tibble: 4 x 3
#>   Variables Tolerance      VIF
#>       <chr>     <dbl>    <dbl>
#> 1      disp 0.1252279 7.985439
#> 2        hp 0.1935450 5.166758
#> 3        wt 0.1445726 6.916942
#> 4      qsec 0.3191708 3.133119
#> 
#> 
#> Eigenvalue and Condition Index
#> ------------------------------
#>    Eigenvalue Condition Index   intercept        disp          hp
#> 1 4.721487187        1.000000 0.000123237 0.001132468 0.001413094
#> 2 0.216562203        4.669260 0.002617424 0.036811051 0.027751289
#> 3 0.050416837        9.677242 0.001656551 0.120881424 0.392366164
#> 4 0.010104757       21.616057 0.025805998 0.777260487 0.059594623
#> 5 0.001429017       57.480524 0.969796790 0.063914571 0.518874831
#>             wt         qsec
#> 1 0.0005253393 0.0001277169
#> 2 0.0002096014 0.0046789491
#> 3 0.0377028008 0.0001952599
#> 4 0.7017528428 0.0024577686
#> 5 0.2598094157 0.9925403056
Stepwise Regression

Build regression model from a set of candidate predictor variables by entering and removing predictors based on p values, in a stepwise manner until there is no variable left to enter or remove any more.

Variable Selection
# stepwise regression
model <- lm(y ~ ., data = surgical)
ols_stepwise(model)
#> We are selecting variables based on p value...
#> 1 variable(s) added....
#> 1 variable(s) added...
#> 1 variable(s) added...
#> 1 variable(s) added...
#> 1 variable(s) added...
#> No more variables to be added or removed.
#> Stepwise Selection Method                                                                  
#> 
#> Candidate Terms:                                                                           
#> 
#> 1 . bcs                                                                                    
#> 2 . pindex                                                                                 
#> 3 . enzyme_test                                                                            
#> 4 . liver_test                                                                             
#> 5 . age                                                                                    
#> 6 . gender                                                                                 
#> 7 . alc_mod                                                                                
#> 8 . alc_heavy                                                                              
#> 
#> ------------------------------------------------------------------------------------------
#>                                 Stepwise Selection Summary                                 
#> ------------------------------------------------------------------------------------------
#>                         Added/                   Adj.                                         
#> Step     Variable      Removed     R-Square    R-Square     C(p)        AIC         RMSE      
#> ------------------------------------------------------------------------------------------
#>    1    liver_test     addition       0.455       0.444    62.5120    771.8753    296.2992    
#>    2     alc_heavy     addition       0.567       0.550    41.3680    761.4394    266.6484    
#>    3    enzyme_test    addition       0.659       0.639    24.3380    750.5089    238.9145    
#>    4      pindex       addition       0.750       0.730     7.5370    735.7146    206.5835    
#>    5        bcs        addition       0.781       0.758     3.1920    730.6204    195.4544    
#> ------------------------------------------------------------------------------------------
Stepwise AIC Backward Regression

Build regression model from a set of candidate predictor variables by removing predictors based on Akaike Information Criteria, in a stepwise manner until there is no variable left to remove any more.

Variable Selection
# stepwise aic backward regression
model <- lm(y ~ ., data = surgical)
ols_stepaic_backward(model)
#> 
#> 
#>                        Backward Elimination Summary                        
#> -------------------------------------------------------------------------
#> Variable        AIC          RSS          Sum Sq       R-Sq     Adj. R-Sq 
#> -------------------------------------------------------------------------
#> Full Model    736.390    1825905.713    6543614.824    0.782        0.743 
#> alc_mod       734.407    1826477.828    6543042.709    0.782        0.749 
#> gender        732.494    1829435.617    6540084.920    0.781        0.754 
#> age           730.620    1833716.447    6535804.090    0.781        0.758 
#> -------------------------------------------------------------------------

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

News

olsrr 0.4.0

Enhancements

  • use ols_launch_app() to launch a shiny app for building models
  • save beta coefficients for each independent variable in ols_all_subset() (#41)

Bug Fixes

  • mismatch in sign of partial and semi partial correlations (#44)
  • error in diagnostic panel (#45)
  • standardized betas in the presence of interaction terms (#46)

A big thanks goes to (Dr. Kimberly Henry) for identifying bugs and other valueable feedbacks that helped improve the package.

olsrr 0.3.0

This is a minor release containing bug fixes.

Bug Fixes

  • output from reg_compute rounded up to 3 decimal points (#24)
  • added variable plot fails when model includes categorical variables (#25)
  • all possible regression fails when model includes categorical predictors (#26)
  • output from bartlett test rounded to 3 decimal points (#27)
  • best subsets regression fails when model includes categorical predictors (#28)
  • output from breusch pagan test rounded to 4 decimal points (#29)
  • output from collinearity diagnostics rounded to 3 decimal points (#30)
  • cook's d bar plot threshold rounded to 3 decimal points (#31)
  • cook's d chart threshold rounded to 3 decimal points (#32)
  • output from f test rounded to 3 decimal points (#33)
  • output from measures of influence rounded to 4 decimal points (#34)
  • output from information criteria rounded to 4 decimal points (#35)
  • studentized residuals vs leverage plot threshold rounded to 3 decimal points (#36)
  • output from score test rounded to 3 decimal points (#37)
  • step AIC backward method AIC value rounded to 3 decimal points (#38)
  • step AIC backward method AIC value rounded to 3 decimal points (#39)
  • step AIC both direction method AIC value rounded to 3 decimal points (#40)

olsrr 0.2.0

This is a minor release containing bug fixes and minor improvements.

Bug Fixes

  • inline functions in model formula caused errors in stepwise regression (#2)
  • added variable plots (ols_avplots) returns error when model formula contains inline functions (#3)
  • all possible regression (ols_all_subset) returns an error when the model formula contains inline functions or interaction variables (#4)
  • best subset regression (ols_best_subset) returns an error when the model formula contains inline functions or interaction variables (#5)
  • studentized residual plot (ols_srsd_plot) returns an error when the model formula contains inline functions (#6)
  • stepwise backward regression (ols_step_backward) returns an error when the model formula contains inline functions or interaction variables (#7)
  • stepwise forward regression (ols_step_backward) returns an error when the model formula contains inline functions (#8)
  • stepAIC backward regression (ols_stepaic_backward) returns an error when the model formula contains inline functions (#9)
  • stepAIC forward regression (ols_stepaic_forward) returns an error when the model formula contains inline functions (#10)
  • stepAIC regression (ols_stepaic_both) returns an error when the model formula contains inline functions (#11)
  • outliers incorrectly plotted in (ols_cooksd_barplot) cook's d bar plot (#12)
  • regression (ols_regress) returns an error when the model formula contains inline functions (#21)
  • output from step AIC backward regression (ols_stepaic_backward) is not properly formatted (#22)
  • output from step AIC regression (ols_stepaic_both) is not properly formatted (#23)

Enhancements

  • cook's d bar plot (ols_cooksd_barplot) returns the threshold value used to classify the observations as outliers (#13)
  • cook's d chart (ols_cooksd_chart) returns the threshold value used to classify the observations as outliers (#14)
  • DFFITs plot (ols_dffits_plot) returns the threshold value used to classify the observations as outliers (#15)
  • deleted studentized residuals vs fitted values plot (ols_dsrvsp_plot) returns the threshold value used to classify the observations as outliers (#16)
  • studentized residuals vs leverage plot (ols_rsdlev_plot) returns the threshold value used to detect outliers/high leverage observations (#17)
  • standarized residuals chart (ols_srsd_chart) returns the threshold value used to classify the observations as outliers (#18)
  • studentized residuals plot (ols_srsd_plot) returns the threshold value used to classify the observations as outliers (#19)

Documentation

There were errors in the description of the values returned by some functions. The documentation has been thoroughly revised and improved in this release.

olsrr 0.1.0

First release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.