Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat by C. Borgelt.


This R package provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides interfaces to C implementations of the association mining algorithms Apriori and Eclat.

Additional packages in the arules family are:

  • Stable CRAN version: install from within R.
  • Current development version: Download package from AppVeyor or install via install_git("mhahsler/arules") (needs devtools)
R> library("arules")
R> data("Adult")
 
R> ## Mine association rules
R> rules <- apriori(Adult, parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
 
Parameter specification:
 confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
        0.9    0.1    1 none FALSE            TRUE     0.5      1     10  rules FALSE
 
Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE
 
Absolute minimum support count: 24421 
 
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[115 item(s), 48842 transaction(s)] done [0.03s].
sorting and recoding items ... [9 item(s)] done [0.00s].
creating transaction tree ... done [0.03s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [52 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].
 
R> ## Show some basic statistics
R> summary(rules)
set of 52 rules
 
rule length distribution (lhs + rhs):
 1  2  3  4 
 2 13 24 13 
 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.000   3.000   2.923   3.250   4.000 
 
summary of quality measures:
    support         confidence          lift       
 Min.   :0.5084   Min.   :0.9031   Min.   :0.9844  
 1st Qu.:0.5415   1st Qu.:0.9155   1st Qu.:0.9937  
 Median :0.5974   Median :0.9229   Median :0.9997  
 Mean   :0.6436   Mean   :0.9308   Mean   :1.0036  
 3rd Qu.:0.7426   3rd Qu.:0.9494   3rd Qu.:1.0057  
 Max.   :0.9533   Max.   :0.9583   Max.   :1.0586  
 
mining info:
  data ntransactions support confidence
 Adult         48842     0.5        0.9
 
R> ## Inspect rules with the highest lift
R> inspect(head(sort(rules, by = "lift")))
    lhs                               rhs                              support confidence     lift
[1] {sex=Male,                                                                                    
     native-country=United-States} => {race=White}                   0.5415421  0.9051090 1.058554
[2] {sex=Male,                                                                                    
     capital-loss=None,                                                                           
     native-country=United-States} => {race=White}                   0.5113632  0.9032585 1.056390
[3] {race=White}                   => {native-country=United-States} 0.7881127  0.9217231 1.027076
[4] {race=White,                                                                                  
     capital-loss=None}            => {native-country=United-States} 0.7490480  0.9205626 1.025783
[5] {race=White,                                                                                  
     sex=Male}                     => {native-country=United-States} 0.5415421  0.9204803 1.025691
[6] {race=White,                                                                                  
     capital-gain=None}            => {native-country=United-States} 0.7194628  0.9202807 1.025469
 

Maintainer: Michael Hahsler

News

Changes in version 1.5-0 (09/23/2016)

  • IMPORTANT CHANGE: apriori uses now a time limit set in parameter with maxtime. The default is 5 seconds. Running out of time or maxlen results in a warning. The warning for low absolute support was removed.
  • is.redundant now also marks rules with the same confidence as redundant.
  • plot for associations and transactions produces now a better error/warning message.
  • improved argument check for %pin%. Warns now for multiple patterns (was an error) and give an error for empty pattern.
  • inspect prints now consistenly the index of rules/itemsets using brackets and starting from 1.

Changes in version 1.4-2 (08/06/2016)

  • Bugfix: is.redundant returned !is.redundant (reported by brisbia)
  • Duplicate items when coercing from list to transactions are now removed with a warning.

Changes in version 1.4-1 (04/10/2016)

  • added tail method for associations.
  • added/fixed encoding for read.transactions
  • Bugfix for interestMeasure. Mutual information is now calculated correctly (reported by ddessommes).

Changes in version 1.4-0 (03/18/2016)

  • The transaction class lost slot transactionInfo (we use the itemsetInfo slot now). Note that you may have to rebuild some transaction sets if you are using transactionInfo.
  • Bugfix for combining item matrices with 0 rows (reported by C. Buchta).
  • Bugfix for itemLabel recoding in is.subset (reported by sjain777).
  • Bugfix for NAMESPACE export for %in%
  • is.redundant: fixed and performance improvement.
  • interestMeasure: performance improvement for "improvement" measure.
  • sort: speed up sort by always sorting NAs last.
  • head: added method head for associations for getting the best rules according to an interest measure faster than sorting all the associations first.
  • Groceries: fixed typo in dataset.
  • abbreviate is now a S4 generic with S4 methods.

Changes in version 1.3-1 (12/13/2015)

  • we now require R 3.2.0 so cbind in Matrix works.
  • is.maximal is now also available for rules.
  • added is.significant for rules (uses Fishers exact test with correction).
  • added is.redundant for rules.
  • added support for multi-level analysis (aggregate).
  • APparameter: confidence shows now NA for frequent itemsets.

Changes in version 1.3-0 (11/11/2015)

  • removed deprecated WRITE and SORT functions.
  • ruleInduction: bug fix for missing confidence values and better checking (by C. Buchta).
  • subset extraction: added checks, handles now NAs and recycles for logical.
  • read.transactions gained arguments skip and quote and some defaults for read and write (uses now quotes and no rownames by default) have changed.
  • itemMatrix: coersion from matrix checks now for 0-1 matrix with a warning.
  • APRIORI and ECLAT report now absolute minimum support.
  • APRIORI: out-of-memory while rule building does now result in an error and not a memory fault.
  • aggregate uses now 'by' instead of 'itemLabels' to conform to aggregate in base.

Changes in version 1.2-1 (09/20/2015)

  • Added many new interest measures.
  • interestMeasure: the formal argument method is now called measure (method is now deprecated).
  • Added Mushroom dataset.
  • Moved abbreviate from arulesViz to arules.
  • fixed undefined behavior for left shift in reclat.c (reported by B. Ripley)

Changes in version 1.2-0 (09/14/2015)

  • added support for weighted association rule mining (by C. Buchta):
    • transactions can store weights a column called "weight" in transactionInfo.
    • support, itemFrequency and itemFrequencyPlot gained a parameter called weighted.
    • weclat extends eclat with transaction weights.
    • hits can be used to calculate weights from transact ions.
  • sort can now sort by several columns (used to break ties) in quality. It also gained an order parameter to return a permutation vector (order) instead.
  • inspect gained parameters setStart, setEnd, itemSep, ruleSep and linebreak to control output better.
  • read.transactions now ignores empty items (e.g., caused by trailing commas and leading or trailing white spaces).
  • labels now returns not a list but consistent labels for objects
    (transactions, itemMatrix, rules, itemsets, and tidLists).
  • tidLists has now an inspect method, gained coercion from "list", and has now a replacement method for dimnames().
  • Coercion from itemMatrix to matrix results now in a logical matrix.
  • fixed as(transactions, "data.frame"). The column names do now have no prefix (except if transactionInfo contains an item called "items").
  • transactions has now its own dimnames function which correctly returns transactionID from transactionInfo as rownames.
  • fixed missing row labels for is.subset().
  • replacement method for dimnames() checks now dimensions.
  • item labels are now internally handled as character using stringAsFactor = FALSE in data.frames and not AsIs with I(character).
  • rules can now have no item in the RHS.
  • We are transitioning to internally use consistently data.frames with the correct number of rows for quality, itemInfo, transactionInfo and itemsetInfo. These data.frames possibly have 0 columns.
  • arules uses now testthat (tests are in tests/testthat).

Changes in version 1.1-9 (7/13/2015)

  • More work on namespace.
  • Fixed tests.

Changes in version 1.1-7 (6/29/2015)

  • itemUnion: fixed bug for large amounts of dense rules.
  • crossTable gained arguments measure and sort.
  • Fixed namespace imports for non-base default packages.

Changes in version 1.1-6 (12/07/2014)

  • dissimilarity method "pearson" is now set to 1 (max) for neg. correlation. Also added phi correlation coefficient.
  • discretize method "cluster" accepts now ... passed on to k-means (e.g., for nstart)
  • merge for itemMatrix checks now for conformity
  • as(..., "transactions"): binary attributes are now translated into items only if TRUE.

Changes in version 1.1-5 (8/19/2014)

  • Import drop0 from Matrix

Changes in version 1.1-4 (7/25/2014)

  • C code: fixed problem in error message generation in apriori and eclat (this fixes the trio library problem under Windows)
  • C code: rapriori uses now STRING_ELT to be compatible with TERR (TIBCO)
  • C code: removed some unused variables.

Changes in version 1.1-3 (6/17/2014)

  • Fixed dependency on XML and pmml
  • the interest measure chi-squared does now also report p-values (with significance=TRUE)
  • interestMeasure calculation checks now better for missing transactions
  • interestMeasure consistently returns now NA if not defined for a certain rule

Changes in version 1.1-2 (2/21/2014)

  • discretize gained the parameter ordered.
  • itemwise set operations itemUnion, itemSetdiff and itemIntersect added.
  • validObject checks now rules more thoroughly
  • aggregate removes duplicate items from the lhs

Changes in version 1.1-1 (1/16/2014)

  • is.superset/is.subset now makes sure that the two arguments conform using recode (number and order of items)
  • is.superset/is.subset returns now a matrix with appropriate dimnames
  • bug fix: fixed dimname bug in as(..., "dgCMatrix") for tidLists
  • image: labels are now passed on correctly.
  • tidLists has now c().

Changes in version 1.1-0 (12/10/2013)

  • bug fix: reuse in now passed on correctly in interestMeasures (bug reported by Ying Leung)
  • direct coercions from and to dgCMatrix is no longer supported use ngCMatrix instead
  • coercion from ngCMatrix to itemMatrix and transactions is now possible
  • C code: fixed misaligned address on 64-bit systems

Changes in version 1.0-15 (9/6/2013)

  • service release

Changes in version 1.0-14 (5/24/2013)

  • discretize handles now NAs correctly
  • bug fix in is.subset

Changes in version 1.0-13 (4/7/2013)

  • transactions: coercion form data.frame now handles logical automatically.
  • discretize replaces categorize and offers several additional methods

Changes in version 1.0-12 (11/28/2012)

  • Added read and write for PMML.
  • 'WRITE' is now deprecated, use 'write' instead
  • C code: Added a copy of the C subscript code from R for better performance and compatibility with arulesSequences

Changes in version 1.0-11 (11/19/2012)

  • Fixed vignette.
  • Internal Changes for dimnames and subsetting

Changes in version 1.0-9 and 1.0-10 (9/3/2012)

  • Added PACKAGE argument to C calls.
  • C code: Added C routine symbols to NAMESPACE for arulesSequence

Changes in version 1.0-8 (8/23/2012)

  • fixed memory problem in eclat with tidLists=TRUE
  • added supportedTransactions()
  • is.subset/is.superset can not return a sparse matrix
  • added support to categorize continuous variables.

Changes in version 1.0-7 (11/4/2011)

  • minor fixes (removed factor in dimnames for itemMatrix, warning in WRITE)
  • read.transactions now accepts column names to specify user and item columns (by F. Leisch)

Initial stable release version 1.0-0 (3/24/2009)

Alpha and beta versions starting with 0.1-0 (4/15/2005)

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("arules")

1.5-4 by Michael Hahsler, 11 days ago


https://github.com/mhahsler/arules, http://lyle.smu.edu/IDA/arules


Report a bug at https://github.com/mhahsler/arules


Browse source code at https://github.com/cran/arules


Authors: Michael Hahsler [aut, cre, cph], Christian Buchta [aut, cph], Bettina Gruen [aut, cph], Kurt Hornik [aut, cph], Ian Johnson [ctb, cph], Christian Borgelt [ctb, cph]


Documentation:   PDF Manual  


Task views: Machine Learning & Statistical Learning


GPL-3 license


Imports stats, methods, graphics, utils

Depends on Matrix

Suggests pmml, XML, arulesViz, testthat


Imported by Biocomb, RKEEL, TELP, clickstream, inTrees, opusminer, preprocomb, sbrl.

Depended on by RSarules, arc, arulesCBA, arulesNBMiner, arulesSequences, arulesViz, ibmdbR, rCBA, recommenderlab.

Suggested by nutshell, nutshell.audioscrobbler, pmml, rattle, rfml.


See at CRAN