Graphical User Interface for Data Science in R

The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself.


rattle 5.2.0 2018-08-12 15:17:12 [email protected]

  • Remove dependency on RGtk2 and check it dynamically. Rattle has more functionality than just the GUI yet we force installation of RGtk2 which is problematic on some platforms.

  • Return the datasets to rattle package. Has caused too much confusion as a separate package.

rattle 5.1.6 2018-08-12 15:17:12 [email protected]

  • Bug fix for new rpart.plot with roundint= handled automatically.

  • Reduce width of bars in ggVarImp() plot.

rattle 5.1.5 2018-07-01 17:31:22 [email protected]

  • Remove deprecated connect-r logo. Reported by Bob Muenchen.

  • Correct and update Help menus. Reported by Bob Muenchen.

  • Remove Report button until updated to newer functionality. Reported by Bob Muenchen.

rattle 5.1.4 2018-05-22 07:05:18 [email protected]

  • Bug fix: Remove na.omit from the calls to generate stats for a hierarchical clustering. Results in errors for the weather dataset. Reported by Tony Nolan.

rattle 5.1.3 2017-10-29 21:25:08 [email protected]

  • Bug fix: xgboost evaluate to score file failing. Needs target in the precit command to succeed! Actually needs a fix to predict.xb.formula but include a work arond for now. Reported by Dwight Barry.

rattle 5.1.1 2017-09-08 16:08:03 [email protected]

  • Update and bug fix to riskchart for risk AUC as provided by Cameron Chisholm.

rattle 5.1.0 2017-09-04 08:20:34 [email protected]

  • Resolve all final check tests, redo testing, and release to CRAN.

rattle 5.0.19 2017-07-10 15:14:34 [email protected]

  • Debug the xgboost interface - limited to binary classifcation tasks for now. Key is that the R code is exported and can be used as a template for extended modelling.

rattle 5.0.18 2017-06-27 06:54:48 [email protected]

  • Tune Boost interface.

rattle 5.0.17 2017-06-24 10:58:15 [email protected]

  • Move the dataset from the rattle package to a separate package in line with CRAN guidelines to have a separate package for slower changing datasets of considerable size. This will also allow the option to provide further datasets for rattle as part of that package.

  • Use weather.csv as the sample for both R and Microsoft R as weatherAUS.csv is too large to include in a CRAN package.

  • Ensure strings are treated as categoricals on loading the data with Microsoft R so as to conform to read.csv() and to be consistent with the non-Microsoft R version of Rattle.

rattle 5.0.16 2017-06-17 08:18:26 [email protected]

  • Update the weather dataset from the Australian Bureau of Meteorology and add sample weatherAUS.xdf to the package to be loaded as the default example when Microsoft R is detected. XDF is a file system based data format used by Microsoft R to handle datasets of any size rather than being limited by available computer RAM.

rattle 5.0.15 2017-06-17 07:48:30 [email protected]

  • Merge RevoScaleR (Microsoft R) support for NeuralNetworks and KMeans from Microsoft India Data Group team.

rattle 5.0.14 2017-06-12 13:12:00 [email protected]

  • Merge support for RGtk2 2.20.31 and 2.20.33 to resolve the bug across all installations.

rattle 5.0.13 2017-06-05 16:51:18 [email protected]

  • Merge initial xgboost support from Zhou Fang. This is in testing and will become the default boosting algorithm soon.

rattle 5.0.12 2017-05-30 13:52:51 [email protected]

  • Bug fix call to errorMatrix() where counts= is not count=.

  • Bug fix to evaluate where respcmd for random forest has disappeared when incorporating MRS updates.

rattle 5.0.11 2017-05-26 17:43:01 [email protected]

  • RGtk2 version 2.20.33 released and caused some issues with Rattle. Heuristic test of libglade/GtkBuilder began failing and retval no longer used for obtaining returned values.

rattle 5.0.10 2017-04-30 13:57:16 [email protected]

  • Review predict.hclust() to use cutree() by default and predict.kmeans() for a Euclidean distance approach as an option. Bug report by Hamed Mamani.

rattle 5.0.9 2017-04-14 13:17:33 [email protected]

  • Introduce a checksum for R datasets (data.frames) so that we can identify when a user has changed the R dataset outside of Rattle, and have the new version loaded.

rattle 5.0.8 [email protected]

  • Incorporate updates for ggratpR(). Close to functional and so it is nearly ready for release.

rattle 5.0.7 2017-03-05 18:13:22 [email protected]

  • Support for rxDTree, rxDForest, rxGlm, rxLinMod. Thanks to Durga Prasad Chappidi.

rattle 5.0.6 2017-02-25 09:51:55 Graham Williams

  • errorMatrix() more robust to character values and miss-match in factor levels. Thanks to Fang Zhou.

rattle 5.0.5 [email protected] 2017-02-15 07:22:43 Graham Williams

  • Update weatherAUS dataset.

  • Bug fix for sample XDF dataset - if smaller the crv$xdf_preview then load the whole dataset into memory.

  • ggVarImp now has n= option for the top n variables. Also supports xgb.Booster models from xgboost.

rattle 5.0.4 [email protected] 2017-02-04 15:15:34 Graham Williams

  • Bug fix ggVarImp to work for randomForest() when importance=FALSE.

  • Add log= option to ggVarImp() for a log scale.

  • Add pc (percentages) and digits to errorMatrix().

rattle 5.0.3 [email protected] 2017-02-02 15:18:28 Graham Williams

  • Add sample_n() for xdf - temporary until dplyrXdf supports it.

rattle 5.0.2 [email protected] 2016-10-02 15:06:52

  • Implement generic ggVarImp() to plot variable importance for different models.

  • Implement errorMatrix() as a replacement for generating code to do this pcme() during a rattle run.

  • Update the weather AUS dataset from the Australian Bureau of Meteorology.

  • Add a subtitle to riskchart().

rattle 5.0.1 [email protected]

  • Begin exposing :: prefix in the log tab. It's educational and self documenting.

  • Support Explore -> Distribution -> Group By to include the numeric target variable (usually only categorics listed) if it has 10 or fewer levels. Suggested by Eugene Dubassarsky.

  • Additional XDF support: rxDForest.

rattle 5.0.0 [email protected]

  • Initial support for the XDF format: rxDTree.

rattle 4.2.0 [email protected] 2016-07-22 06:19:15

   * Include dplyr as an Import.

   * Add support for Eugene Dubassarsky's ggraptr.

   * Cleanup and perfect executeModelRF and Log code.

rattle 4.1.8 [email protected] 2016-06-24 20:36:51

   * Add transparency to ggpairs plot. Reported by Eugene

rattle 4.1.7 [email protected] 2016-06-21 21:02:13

   * Bug fix for Benfords when the target is numeric. An empty
     Group By will use the target variable to stratify. Reported
     by Eugene Dubossarsky.

   *  Spelling fixes provided by George Wilson.

rattle 4.1.6 [email protected] 2016-06-21 20:34:49

   * Bug fix for new version of GGally - to get target
     colours. Reported by Eugene Dubossarsky.

rattle 4.1.3 [email protected] 2016-05-12 10:22:01

   * Update copyright to 2016.

   * Add stringr dependency.

   * Fix missing comment character in log tab.

rattle 4.1.3 [email protected] 2016-03-13 15:07:07

   * Add type= to fancyRpartPlot(). Requested by Michelle Gosse.

rattle 4.1.2 [email protected] 2016-03-13 06:24:21

   * Bug fix for missing GUI code for
     export_filechooserdialog. Reported by Bill Burns.

rattle 4.1.1 [email protected] 2016-01-26 19:50:07

   * Bug fix for a single input variable in the dataset when
     scoring. Reported by Szabo Szilard.

rattle 4.1.0 [email protected] 2016-01-26 11:12:01

   * Bug fix calculation of confusion matricies when either actual
     or predictive values have missing values. Reported by Roger

   * Make the transform more robust by ensuring
     the by argument is a factor, converting as needed. Reported
     by Tony Nolan.

   * Bug fix for plots when there is no target in the
     dataset. Reported by Albert Lee.

   * Bug fix in calculation of the overall error rate in the
     confusion matrix. Show overall error as percentage not
     proportion. reported by Eugene Dubossarsky.

   * Remove grid from ggpairs plot and fine tune for presentation.

rattle 4.0.0 [email protected] 2015-09-21 06:00:49

   * Migrate hosting of the package to Bitbucket:

   * Use Connect-R logo as the icon for the button.

rattle 3.5.11 [email protected] 2015-09-16 19:22:02

   * Add button to toolbar to open a Connect-R page for feature

   * Bug fix confusion matrix Error calculation and average error
     calculation. Reported by Eugene Dubossarsky.

   * Only default to TIME* variable as target if Survival model is

rattle 3.5.10 [email protected] 2015-09-16 19:22:02

* Explore tab's Distribution option now allows the user to
      choose how to group the data for plotting, with the Target
      as the default but a choice of any Categoric vairable
      available, or none.

* Bug fix when scoring a clustering with no identifier nor
      target. Reported by Abhishek Sharma.

rattle 3.5.9 [email protected] 2015-09-16 05:53:00

    * Incorporate pairs plots into Distributions option of the
      Explore tab. Contributed by Jose A MagaƱa.

rattle 3.5.8 [email protected] 2015-08-28 10:21:59

* Migrate histogram plots to using pipes and generally clean up
      the code.

* Introduce appendLibLog to handle namespaces in the Log
      tab. Namespace prefix is removed and replaced by a library()
      call as a user would normally do.

* Migrate Box Plots to using pipes and place multiple box
      plots or histograms onto a single grid.

rattle 3.5.7 [email protected] 2015-08-21 19:17:56

  * Move to using clusplot from cluster rather than plotcluster
    from fpc to obtain ellipses to show the clusters.

rattle 3.5.6 [email protected] 2015-08-20 21:30:29

  * Gracefully handle no network connection in rattleInfo().

rattle 3.5.5 [email protected] 2015-08-17 19:29:41

   * Bug fix for traditional graphics and ROCR suite of plots
     under evaluate tab - need to use namespace to get correct
     version of plot().

rattle 3.5.4 [email protected] 2015-07-26 12:07:02

   * Add palettes= to allow limited changing of colours in

   * Bug fix for fancyRpartPlot() where rule conditions were being
     replaced with coloured blocks.

rattle 3.5.3 [email protected]

   * Add a test to riskchart() to if there are more than two

rattle 3.5.2 [email protected]

   * Extend Error Matrix calculations in Evaluate to support
     multinomial targets as well as binomial targets.

rattle 3.5.1 [email protected]

   * Bug fix in calculation of overall and average class
     errors. Thanks to Eugene Dubossarsky.

rattle 3.5.0 [email protected]

   * Replace xlsx::read.xlsx() with readxl::read_excel() to remove
     reliance on Java which has always been problematic in terms
     of Windows users having trouble installing Java. Thanks to Ed
     Stoker for testing. (3.4.3)

   * When iterating over kmeans clusters now plot from 1 cluster
     rather than 3. Thanks to Eugene Dubossarsky. (3.4.4)

   * Updates to normVarNames() due to Hadley's changes to
     stringr. Also capture other characters to map.

   * Add title.size argument to riskchart(). Also support
     horizontal legend. Fix the text glob for the Lift label.

   * Revert to using only exported functions from
     pkgDepTools. (3.4.1)

   * Fix some tooltip and textview typos suggested by Kees
     Schippers. (3.4.2)

   * move from weightedKmeans to wskm.

   * Numerous updates to support new CRAN checks, particularly
     related to use of name spaces and requiring to make rattle
     depend on RGtk2.

* weatherAUS dataset is updated.

    * Update rattleInfo() to be more efficient by doing dependency
      graph myself.

rattle 3.4.0 [email protected] 2014-12-29 19:11:59 +11:00

   * Revert traditional ROC eval plot to overlay all models on the
     one plot. Eugene Dubossarsky

   * Bug fix to fancyRpartPlot() from John Vorwald when
     model$frame$yval all negative.

   * Replace comma in normVarNames().

   * Remove latticist - no longer avaliable on CRAN.

rattle 3.3.0 [email protected] 2014-09-09 18:25:21 +1100

   * Migrate to using namespace for external functions.

rattle 3.2.0 [email protected] 2014-09-04 06:14:03 +1100

* Execute button when clicked from the Log Tab will execute all of
the code in the Log tab. Suggested by Scott MacLean, 24 July 2014)

* Add the average error rate to the evaluations, as proposed

* Numerous ggplot2 updates and bug fixes.

* MS-Windows support for xlsx files bug fixed. Allow sub=
      option in fancyRpartPlot.

rattle (3.1.0)

* Numerous updates of plots to use ggplot2 rather than base
graphics: ROC curves, riskchart, box plots, histogram plots, pairs
plot, Benfords. Advanced Graphics is now the default, reverting to tradition
graphics where needed. The migration to ggplot2 is ongoing.

* Added new Benfords functionality.

* Added a rescale option to kmeans.

* New psfchart() for evaluation.

* New function normVarNames() to normalise variable names to a
standard preferred style

* Evaluate -> Error Matrix has been updated to report averaged
class error and to report class errors.

* Evaluate -< PrvOb plot bug fix for non-missing data.

* INSTALL: Remove old INSTALL file - visit for
installation instructions.

* plotNetwork() has been removed - not used by Rattle and
generally of limited use. See for the code.

* No longer report repository revision number in version or about.

* Miscellaneous bug fixes and stability improvements.

* weatherAUS dataset is up-to-date.

-- Graham Williams [email protected] 2014-07-18 14:32:07 +1100

rattle (2.6.26) unstable; urgency=low

  • Replace .path.package with path.package as requested by Ripley. The hidden version will disappear soon and the new version has been available since 2.13.0.

  • Update boost help to note that it is available only for binary classification.

  • Default stemming for textmining of a corpus is no active if the Snowball package is available.

  • For Advanced Graphics introduce a dendrogram plot using ggplot2.

  • Various text mining improvements. Bug fix in checking if data needs reloading. Support checking if corpus needs reloading. Add extra cursor and status bar messages. For corpus, set default folder to be getwd(). Check for mismatch between number of docs in corpus and the number of targets in .targets.csv. For the Corpus file dialog, do not offer folder creation.

  • Remove macosx special rattle.ui. The ubuntu specific text no longer appears in the saved ui file.

  • Internally: Move rattleGUI to crv from crs. The crs is saved as the state, and this was confusing the GUI on a project restore. Had to ensure we restored rattleGUI with the current rattleGUI - this fixes loadProject bug. Also, in Load project, filter on .RData not .Rdata.

  • Add newdata= to call to predict, in line with the standard approach by party (reference Torston). Remove the OOB= for predict for cforest. With a new dataset OOB makes no sense. It was in there because newdata= was not being used and positionally having issues.

  • Update fancy rpart plot to reduce colour intensity for printing and a nicer tree structure. Add all class probs to fancy tree.

  • Define paste0 if it is not defined. It was introduced in 2.15.0 but is too early to assume the world is with us.

  • Replace siatclust with weightedKmeans.

  • Bug fix in OOB plot when impute is off - need to omit missing values. Update message regarding random forest and na.omit() removing all rows, noting the option to use na.roughfix().

  • Fix bug identified by Brian Feeny 121209 - score a RF test dataset without a target variable tries to add one in all NA but fails if it is the last variable.

  • Experimentally add Deducer's data.viewer to View data. Ensure we ask user if when using Plot Builder it is okay to create a dataset in their work space. Hopefully keeps us in line, if not strictly in copmliance, with CRAN policy.

  • Remove SVG support - RSvgDevice is no longer available.

-- Graham Williams [email protected] Sat, 16 Mar 2013 13:27:05 +1100

rattle (2.6.25) unstable; urgency=low

  • Review all of the code and remove two instances of using copyrighted code without attribution. One was a copy or print.rpart, where rattle added a translation wrapper to the text message. Another was code copied from the Internet from David Hand - use the Hmeasure package now. Note in drawTreeNode() reference to the original author and lack of copyright. Note author in [email protected] ggcorplot is now available from Deducer. Remove it from Rattle. Replace Hand measure with HMeasure from hmeasure. Add Mark Vere Culp as aux author. Remove commented out code. Remove lss and cranSearch - not really part of Rattle.

  • Update to new style [email protected]

-- Graham Williams [email protected] Sat, 23 Jan 2013 13:12:43 +1100

rattle (2.6.24) unstable; urgency=low

  • Bug fix for box plot using ggplot2.

  • Finish the implementation of riskchart using ggplot2 to mimic the old version of risk charts.

  • Remove copied code from print.rpart, known as rattle.print.rpart, and originally used without proper credit to Brian Ripley, but no longer required. Use his original versoin from rpart itself, though lose the translations.

  • Migrate to a cleaner structure for managing the source package locally at togaware.

  • Bug fix fancy rpart plot to handle regression as suggested by Yana Kane-Esrig.

  • For arules, add option to specify minimum length.

  • Update to new version of RGtk2Extras' dfedit, without a pretty_print option. Also able to assign result into crs$dataset now.

  • Remove two instances of global variable assignments. Temporarily remove PlotBuilder and scoring of manually entered datasets.

-- Graham Williams [email protected] Tue, 11 Dec 2012 06:45:50 +1100

rattle (2.6.21) unstable; urgency=low

  • Retain depend on R > 2.12.1.

  • Ensure repo is maintained.

  • Better detect arules error message for duplicate items in a basket.

  • Update ggplot2 calls to conform to 0.92. Also turn advanced graphics on by default. Implement risk charts using ggplot2.

  • Start introducing suppressPackageStartupMessages to avoid excessive messages in the console.

  • Do AUC only for binomial targets.

-- Graham Williams [email protected] Mon, 10 Sep 2012 19:27:42 +1000

rattle (2.6.20) unstable; urgency=low

  • Because of use of globalVariables Rattle now depends on R >= 2.15.1. However, check this conditionally to retain backward compatibility for now. Reported by Uwe Ligges.

  • For show arules, eval in global environment else it does not show the rules. Reported by Tania Churchill.

-- Graham Williams [email protected] Mon, 23 Jul 2012 02:27:18 +1000

rattle (2.6.19) unstable; urgency=low

  • Depend on weightedKmeans rather than siatclust.

  • Bug fix: correlation plots stopped working.

  • Bug fix: ggcorplot use of size_scale started failing. Perhaps because of new version ofggplot2.

  • Bug fix: notice when a restored project does not have a filename set.

  • Fix some logic errors in rf.

  • Add 0,0 point to evaluateRisk.

  • Make risk, recall, precision as default names in risk chart.

  • Add new riskchart funciton using ggplot2.

  • Allow additional arguments to fancyRpartPlot passed through to prp.

  • Update copyrigt to 2012.

  • Allow y for yes in installing initial RGtk2.

  • List global variables to avoid check messages.

-- Graham Williams [email protected] Wed, 04 Jul 2012 22:15:27 +1000

rattle (2.6.18) unstable; urgency=low

  • Ensure require uses quietly rather than quiet.

  • Clean up randomForest textview output.

  • Update pmml to 4.0. Fix various format issues and other updates from Tridi of Zementis.

  • Update setupDataset but also note that it is moving into a separate package, container.

  • Get odfweave stuff working again.

  • Update fancyRPartPlot - being used in SIAT software. Can now handle any number of classes.

  • Updates to the pmml rsf code.

  • Bug fix for evaluation of conditional trees and random forests.

  • Further pmml export of randomForest updates.

  • Add PlotBuilder as interative explore option.

  • Export pmml for glm models.

  • Enhance ggplot2 plotting of boxplot.

-- Graham Williams [email protected] Sun, 22 Apr 2012 21:47:00 +1000

rattle (2.6.17) unstable; urgency=low

  • Add a log10 transform to the GUI, R10 prefix, add tooltip, handle it in pmml, create new rattle_macosx.ui. Suggested by Christophe Klopp.

  • Bug fix usage of believeNRows - it was being ignored from the GUI, but is now acted upon. Reported by Andrew Elliott.

  • Add ggplot2 box plots to Advanced Graphics option.

  • Remove the timestamp messages.

  • Update pmml to handle randomForest and rattle to export to pmml.

  • Bug fix in naming the dataset when it is editted.

  • Bug fix for ggcorplot when less than 6 vars - need to map var names into a c() call.

-- Graham Williams [email protected] Sun, 19 Feb 2012 21:49:45 +1100

rattle (2.6.16) unstable; urgency=low

  • rattleInfo() now also notes if rattle itself needs upgrading.

  • Bug fix in show association rules. It now works again.

  • Forgot to include in NAMESAPCE.

  • CITATION to the book rather than the article. That is a more definitive resource, though not freely available.

-- Graham Williams [email protected] Sat, 24 Dec 2011 15:35:21 +1100

rattle (2.6.15) unstable; urgency=low

  • Bug fix for Mac OS/X on 2.14.0 with a call to set.cursor failing because the textviews do not yet exist. Problem is that the addFromFile for the GUI is generating a Warning that seems to now stop the file being loaded. Removing the particular XML elemnts causing the warning (one ubuntu_local and 4 GtkTreeSelections) "fixes" the problem.

-- Graham Williams [email protected] Sat, 03 Dec 2011 22:49:18 +1100

rattle (2.6.14) unstable; urgency=low

  • Add OOB ROC button to Forest option of Model tab as suggested by Akbar Waljee.

  • Bug fix for loading R Dataset data frame named dataset. Bug reported by George Dontas.

  • Use roc.plot() from evaluation. Suggested by Akbar Waljee.

  • Use packageStartupMessage.

  • Ensure oob roc plot handles numeric targets.

-- Graham Williams [email protected] Wed, 16 Nov 2011 06:01:17 +1100

rattle (2.6.13) unstable; urgency=low

  • Add wtd.quantile type to binning. Suggested by Brenton R. Stone.

-- Graham Williams [email protected] Tue, 25 Oct 2011 21:34:13 +1100

rattle (2.6.12) unstable; urgency=low

  • Ensure the data partitions that are specified are appropriate. Also allow some flexiblity in specifying: 70 or 70/30 or 70/15/15. For the first two the training is 70% and testing is 30%. For the third, validation is 15% and testing is 15%.

  • Update text mining support for lates version of tm.

  • rattleInfo() was incorrectly counting the unmber of packages listed.

-- Graham Williams [email protected] Sun, 23 Oct 2011 06:00:16 +1100

rattle (2.6.11) unstable; urgency=low

  • Use listAdaVarsUsed in Rattle.

  • Use fancyRpartPlot in Rattle.

  • Note rattle.ui requires gtk > 2.16, not > 2.20. Otherwise fails to start on Mac OS/X.

-- Graham Williams [email protected] Wed, 05 Oct 2011 19:12:28 +1100

rattle (2.6.10) unstable; urgency=low

  • Add listAdaUsedVars support function.

  • Workaround CairoDevice issue on Windows by defaulting to not using it, as in the Settings menu.

  • Add common name and crv constant for ewkm.

  • fancyRpartPlot has optional main title as empty string.

  • biclust now reports a biclust built rather than reporting a kmeans built.

  • Add weights plots for ewkm from siatclust.

-- Graham Williams [email protected] Sun, 11 Sep 2011 17:08:18 +1000

rattle (2.6.9) unstable; urgency=low

  • AdaBoost now also reports which variables are used in the collection of trees built, and the number of trees in which a variable appears.

  • Add setupDataset and whichNumeric to support encapsulation of data mining objects.

  • Add a fancyRpartPlot so my fancy rpart tree is available outside of the rattle GUI.

  • Correct the textview information relating to confusion matrices.

  • Add doRiskChart to simplify using the risk charts.

-- Graham Williams [email protected] Sun, 04 Sep 2011 21:03:32 +1000

rattle (2.6.8) unstable; urgency=low

  • Ensure ggplot2 loaded before plot ctree.

  • Handle probability predictions for ctree and cforest in evaluation.

-- Graham Williams [email protected] Tue, 26 Jul 2011 22:03:47 +1000

rattle (2.6.7) unstable; urgency=low

  • Add support for the entropy weighted k-means subspace clustering algorithm from the ewkm package.

  • Ensure rattle can load with only the base package installed (so install.packages is prefixed with utils:::).

  • Migrate from using installed.pacakges() since it can be very slow on MS/Windows.

  • Add an experimental dataset option to the command line call to rattle.

  • Allow a bygroup to be used for any numeric transform.

  • Add a plot for association rules.

  • Display a ggplot2 scatterplot if advanced plots is enabled.

  • rattle:::executeExplorePlot made more friendly for calling from outside of Rattle.

  • Tidy up the rattleInfo manual page.

  • Master Makefile should respond with help if no target specified.

-- Graham Williams [email protected] Mon, 18 Jul 2011 06:53:47 +1000

rattle (2.6.6) unstable; urgency=low

  • Settings/Tooltips should be shown as TRUE.

  • Add Settings/GGPlot2 to enable enhanced graphics (generally using ggplot2) where they have been implemented.

  • Implement a ggplot2 pairs plot (scatterplot) as the plot to use when ggplot2 is enabled and under Explore/Distriubtions no variables are chosen to be displayed. Uses ggcorplot from Deducer.

  • Implement use of rpart.plot's prp() when ggplot2 is enabled.

-- Graham Williams [email protected] Sat, 09 Apr 2011 22:16:29 +1000

rattle (2.6.5) unstable; urgency=low

  • Add rattleReport() - report on current state of rattle modelling.

  • Restore the ByGroup option for now until it can be coded for the about transforms.

  • Deal with UTF-8 encoding of Japanese filenames in data and evaluate, using iconv.

  • Be sure to include http:// in web links, though on MS/Windows still not working: Could Not Show Link... No application is registered as handling this file

  • On loading a dataset, convert any character variables to be factors. Rattle does not handle character variables, so the translation seems appropriate.

  • Association rules status bar was refering to decision trees. Fixed. (Pointed out by Xiaobo Gu)

  • Fix an introduced bug in handling of categorics in numeric transforms.

  • Fix a bug where imputation for a categoric with class "ordered" and "factor" was treating it as a numeric (because "ordered" is not "factor").

  • Some Help menu items under Test were not loading the required package and thus were not displaying the help.

  • Only do crosstabs when we have categoric variables.

  • Updated translations.

-- Graham Williams [email protected] Sun, 13 Mar 2011 16:46:20 +1100

rattle (2.6.4) unstable; urgency=low

  • Confusion matrices transposed to conform to what most people exect: Actual is on left and Predicted is on top. Retain the name as Error Matrix in Rattle for now.

  • Use different pch for a dotchart.

  • Include the install.packages(rattleInfo()) trick in the output of rattleInfo().

-- Graham Williams [email protected] Sat, 19 Feb 2011 06:26:09 +1100

rattle (2.6.3) unstable; urgency=low

  • weather.arff Date field should have 'date' data type.

  • The rug plot of histograms is no longer coloured. For large datasets, there is much overplotting and so it can in fact be quite misleading.

  • Box plots now use varwidth=TRUE to indicate the distribution of the target variable.

  • Bug fix: exportHClustTab should not have a file argument.

-- Graham Williams [email protected] Sun, 13 Feb 2011 21:42:11 +1100

rattle (2.6.2) unstable; urgency=low

  • Rename to rattleInfo(), modelled on sessionInfo() naming. Include available CRAN version of rattle in the output.

  • Ensure connection is closed on pmmltoc export from Rattle.

  • questionDialog needs to not use RGtk2 if RGtk2 is not installed!

  • Emphasise that Rattle is free in loading the rattle package.

  • exportKmeansTab does not require the file argument.

-- Graham Williams [email protected] Wed, 02 Feb 2011 05:46:28 +1100

rattle (2.6.1) unstable; urgency=low

  • When exporting a regression model, be sure to use proper slash (i.e., not the Windows slosh) for log tab record of the command.

  • Add rattle.ui to the google code repository.

  • Remove as many literals as possible from the Log tab - so that crs$dataset[crs$sample, c(2:10,14,16:20)] becomes crs$dataset[crs$sample, c(crs$input, crs$target)], for example. Similarly for the set.seed and other data storing variables.

  • Other Log tab cleanup.

  • Fix bug that caused failure on reading an .xls data file.

  • now returns the list of packages that need updating.

  • In exporting a model as C code, if we are Japanese on Windows then note that the encoding is shift-jis rather than utf-8 for some reason.

  • Improve infrastructure for the generation of C code from PMML.

-- Graham Williams [email protected] Thu, 13 Jan 2011 21:50:53 +1100

rattle (2.6.0) unstable; urgency=low

  • Keep track of project names and use as default name to save a project to. Suggested by David Cochrane.

  • Add strip.white to the default for reading CSV files. Suggested by Robert Muenchen.

  • Bug fix on resetEvaluateTab - Data row was being reset to sensitive because model was being toggled.

  • Disconnect Rattle versions from google code revision numbers since the revision numbers change each change to the Wiki.

  • Indicator Variables will Ignore the first of the new indicator variables. Suggested by Robert Muenchen.

  • Include the Target name in listing of a decision tree as a rule set.

  • On adding to the log when saving a plot make sure carioDevice is loaded and the file name path separators are appropriate. Reported by Shane Butler 11 Dec 2010.

  • Ensure filename string is UTF-8 when exporting a file, to handle Japanese filenames.

  • For nnet, choose a seed so weather generates a non-trivial model.

  • Refer to remapping as recoding in line with commonly used terminology.

  • Default back to showing text on icon for buttons. Seems okay in the new version of Gtk.

-- Graham Williams [email protected] Sat, 11 Dec 2010 13:39:55 +1100

rattle (2.5.47) unstable; urgency=low

  • Add a useGtkBuilder argument to rattle(). If NULL, then heuristically determine, otherwise go with the specified choice, if possible.

  • Remove RGtk2, colorspace, and pmml as dependencies. Now dynamically check and offer to install. This also helps reduce chance of the XML/RGtk2 zlib1.dll bug, and also ensure RGtk2 loads before XML to avoid that bug.

-- Graham Williams [email protected] Mon, 15 Nov 2010 21:50:15 +1100

rattle (2.5.46) unstable; urgency=low

  • Bug fix for fixTranslations.

  • Save weights information in PMML.

  • Cleanup SVM command generator.

-- Graham Williams [email protected] Thu, 11 Nov 2010 19:08:36 +1100

rattle (2.5.45) unstable; urgency=low

  • Check for GtkBuilder handling of the 'requires' tag, and if not handled the don't use GtkBuilder.

  • Bump pmml version through 1.2.25 to 1.2.26.

  • Change default nolan groups for a singularity to 50 rather than 99.

  • PMML bug fix when glm and using weights.

  • Move all variable initialisation from .onLoad to .onAttach. This will ensure .RData saved (and therefore old) versions of the variables will not overwrite the proper versions in a newer release of Rattle.

-- Graham Williams [email protected] Sat, 09 Oct 2010 08:16:15 +1100

rattle (2.5.44) unstable; urgency=low

  • Add an include.libpath to to provide information about where the packages are installed.

  • Check for failed startup of rattle GUI using GtkBuider (because the Gtk library installed does not recognise 'requires' and suggest a workaround).

  • Condiionally turn toolbar Text (in addition to just Icons) on.

  • For loading spreadsheets, make sure RODBC is available and loaded.

  • Ensure 'ordered categoric' are treated as categoric for Explore, Distribution.

-- Graham Williams [email protected] Tue, 05 Oct 2010 18:08:20 +1100

rattle (2.5.43) unstable; urgency=low

  • Ensure gtkBuilder is setting the correct translation domain for the interface.

  • Add global option for not showing timestamps: crv$show.timestamp.

  • Add optional arg to newProject to not ask about overwriting a project. Default is as previously - to ask.

-- Graham Williams [email protected] Wed, 22 Sep 2010 05:37:53 +1000

rattle (2.5.42) unstable; urgency=low

  • Update to recursively identify all dependencies, report their version number and any updates available from CRAN and generate command to update packages that have updates available. See ? for the options.

  • Fix bug causing R Dataset option of the Evaluate window to always revert to the first named dataset.

  • Fix bug in transforms where weights were not being handled in refreshing of the Data tab.

  • Fix a bug in box plots when trying to label outliers when there aren't any.

-- Graham Williams [email protected] Sun, 19 Sep 2010 05:01:51 +1000

rattle (2.5.41) unstable; urgency=low

  • Use GtkBuilder for Export dialog.

  • Test use of glade vs GtkBuilder on multiple platforms.

  • Rename to rattle.version.

  • Add weight column to data tab.

  • Support weights for nnet, multinom, survival.

  • Add weights information to PMML as a PMML Extension.

  • Ensure GtkFrame is available as a data type whilst waiting for updated RGtk2.

  • Bug fix to packageIsAvailable not reruning any result.

  • Replace destroy with withdraw for plot window as the former has started crashing R.

  • Improve Log formatting for various model build commands.

  • Be sure to include the car package for Anova for multinom models.

  • Release pmml 1.2.24: Bug fix glm binomial regression - note as classification model.

-- Graham Williams [email protected] Wed, 15 Sep 2010 14:56:09 +1000

rattle (2.5.40) unstable; urgency=low

  • Conditionalise useGtkBuilder: if windows and R before 2.12.0 then libglade if unix and R 2.12.0 then libglade for now (RGtk2 update needed?) all else use GtkBuilder

-- Graham Williams [email protected] Sun, 22 Aug 2010 12:02:00 +1000

rattle (2.5.39) unstable; urgency=low

  • Conditionally use either libglade2 or GtkBuilder for the GUI. libglade2 (a separate library to the Gtk+ library) is deprecated and as of R 2.12.0 won't be supported on MS/Windows binary builds. The default is now GtkBuilder (built into the Gtk+ library), and support for libglade2 within Rattle is deprecated. RGtk2 (2.12.18) still has issues in its support of GtkBuilder and is being actively worked on, but Rattle is currently working around these.

-- Graham Williams [email protected] Sat, 21 Aug 2010 07:47:43 +1000

rattle (2.5.38) unstable; urgency=low

  • Ensure pmml.ksvm will at least run - though resulting PMML not validated.

  • Bump pmml version to 1.2.23

-- Graham Williams [email protected] Fri, 06 Aug 2010 05:56:11 +1000

rattle (2.5.37) unstable; urgency=low

  • The Predictive tab has gone back to being Model. Not sure which is best.

  • cranSearch defaults to r-project rather than unimelb.

  • Migrate from RGtk2DfEdit to its replacement, RGtk2Extras.

  • Revert cairoDevice to being a Suggests rater than Depends.

  • Remove redundant CITATION from root of package, as the real one is in inst.

-- Graham Williams [email protected] Sat, 31 Jul 2010 14:34:50 +1000

rattle (2.5.36) unstable; urgency=low

  • Add Bill Venables' searchCRAN example code.

  • Improve error message when we find duplicate variable names in a loaded file, which might result when there is no header line.

  • Add help item for Projects.

  • On Evaluate with supplied file, use the hdr specified on the Data tab.

-- Graham Williams [email protected] Mon, 12 Jul 2010 06:43:06 +1000

rattle (2.5.35) unstable; urgency=low

  • Add utility lss function to list object sizes.

  • Add options text entry for SVM to easily allow other options.

  • Better formatting of the Log tab.

  • Use a set.seed for SVM to ensure same model each time.

  • Add option to random forest to impute missing values rather than simply ignoring the observations.

  • On Evalaute with supplied file, use the sep specified on the Data tab, thus allowing TXT files.

  • On loading a new dataset for evaluation be sure to add in any missing columns, and unify the levels.

  • Improve binning documentation.

  • Make RGtk2, cairoDevice, colorspace all dependencies so we can get rattle started and then rattle will prompt to install other packages that are mssing when it needs them.

-- Graham Williams [email protected] Thu, 01 Jul 2010 15:34:50 +1000

rattle (2.5.34) unstable; urgency=low

  • When a package is missing, there is now the option to install it right then, and it continues as normal after it gets installed.

  • Change Suggests to Depends so all used pacakges get loaded on loading rattle, in an attempt to make it easier to install Rattle. Then the r-cran-rattle package on Debian/Ubuntu will have all required dependencies and a normal install.packages will get all dependencies also, rather than having to use dependencies=c('Depends', 'Suggests'). Penalty is it takes 20 seconds to do 'library(rattle)' on a server and 90 seconds on a netbook - so revert back to not doing this.

  • Ensure the new train/validate/test scneario is saved across projects.

-- Graham Williams [email protected] Wed, 09 Jun 2010 07:04:08 +1000

rattle (2.5.33) unstable; urgency=low

  • Bug fix rf.cmd.

  • Improve scoring functionality: The dataset can have NA's for target, and these can now get scored by rf on Evaluate tab. Loading a CSV file to be scored no longer needs to have the target column included (previously it needed to be there and have non-NA values). Thanks to Chris Snijders.

-- Graham Williams [email protected] Mon, 31 May 2010 06:22:54 +1000

rattle (2.5.32) unstable; urgency=low

  • Remove dependency on car - not actually being used at the moment.

  • For random forest, allow sample size text entry as a single integer or a list, as per randomForest.

  • Use na.omit with cforest, as is done with randomForest.

  • For randomForest turn subsampling with replacement off since it is more likely to produce biased importance measures, as explained in by the cforest papers.

  • Fix bug with multiple "contact support" lines in error popups.

  • When showing the randomForest importance values, sort on the accuracy measure rather than the Gini measure, since the Gini is biased in favour of categoric variables with many categories.

  • ada boost seed should be 42, like all other seeds.

  • Tidy up some ada output.

  • Bug fix - save project for rf failing (looking for rf_sampsize_entry).

  • Remove text from toolbar by default.

  • Change order of Forest/Boost buttons on Model tab.

  • Add tooltips for all toolbar buttons.

-- Graham Williams [email protected] Fri, 28 May 2010 15:47:15 +1000

rattle (2.5.31) stable; urgency=low

  • Add to list information for debugging purposes.

  • Bump pmml to 1.2.22

  • Fixes from [email protected]: Extension in Header should be first element. Coefficients in regression models should not be NA (as will be for singularities), but replace with, and so no impact of change.

  • Ensure Survival defaults are reset appropriately.

-- Graham Williams [email protected] Wed, 19 May 2010 09:50:39 +1000

rattle (2.5.30) stable; urgency=low

  • On MS/Windows with Japanese, read.csv needs encoding option set with file rather than with read.csv (for UTF-8) but seems okay under other scenarios.

  • On MS/Windows with Japanese (UTF-8) the encoding of the variables selected for transforming needs to be UTF-8 for much of the process, but "unknown" when using Rtxt and sprintf (when substituting the variable names) to ensure resulting message is correctly matched for encodings.

-- Graham Williams [email protected] Wed, 19 May 2010 09:47:12 +1000

rattle (2.5.29) stable; urgency=low

  • Add the translation file.

  • Fix an Encoding/sprintf issue for Japanese on MS/Windows.

  • Allow crv$NOTEBOOK.MODEL.NAME to be overridden by other packages (RStat).

  • When dispatch fails be sure to include the Tab label on which it fails.

  • Ensure HClust Options are re-enabled on loading a project.

-- Graham Williams [email protected] Sat, 24 Apr 2010 07:32:02 +1000

rattle (2.5.28) stable; urgency=low

  • Minor format changes for glm and rf model output.

  • Capture additional survival model error and suggest a solution.

  • Remove spurious additional plot for Survival Residual plot.

  • Update log tab labels to be more generic.

  • Update tooltips to be generic and add survival tooltips.

-- Graham Williams [email protected] Thu, 22 Apr 2010 06:21:58 +1000

rattle (2.5.27) unstable; urgency=low

  • Further translation fixes. In particular, use Encoding(...getText()) <- "UTF-8" to ensure strings from the GUI ate UTF-8, and not unknown.

  • Ensure training dataset rather than sample dataset nomenclature is now used.

  • Ensure execute button can only be clicked once while it is processing.

  • Survival plot buttons need to be made sensitive as appropriate.

  • For Japanese on MS/Windows do not use monospace font since this ends up vertically cenbtering periods and commas (and all other characters). Need a fixed width font that does not do this, but for now we put up with variable width font.

  • Revert to using only English for all hidden tab labels.

  • Improved identification of current plot number.

  • Bug fix multiple vars selected for asnumeric and ascategoric transforms.

-- Graham Williams [email protected] Thu, 22 Apr 2010 06:17:20 +1000

rattle (2.5.26) unstable; urgency=low

  • Add Cross Tab option to Explore tab to generate cross tabulations of each categoric variable by the target variable. (Luke Lake)

  • Bug fix - improve how we obtain the plot number from the title, particularly in the context of translations.

  • Further translation markup.

  • Clean up the use of dfedit.

  • Minor improvement to spacing in Log tab.

-- Graham Williams [email protected] Tue, 30 Mar 2010 21:37:29 +1100

rattle (2.5.25) unstable; urgency=low

  • Start using the RGtk2DfEdit for the View and Edit buttons of the Data tab, and the Enter/Score option of the Evaluate tab. RGtk2DfEdit provides a spreadhseet like interface to the data. Various data editing options are available. Also press = to run an arbitrary R command on selected data (e.g. select two columns of data and issue the plot command).

  • Add further markup of text for translations.

  • Support specification of the character used for decimal points (to suit some European usage).

  • Fix bug in exporting XML - replace & with &

  • Survival plots - split survival chart plot from residuals plots, and plot all residuals.

  • Fix logic behind what is greyed out in the Test tab.

-- Graham Williams [email protected] Mon, 29 Mar 2010 19:37:25 +1100

rattle (2.5.24) stable; urgency=low

  • Revamp the help text, and put into the Rtxt translation framework.

  • Fix the height of the data name widget (the library option was growing the width for some reason).

  • For Evaluate, add Full and Enter as dataset options. Enter will pop up an editor with the final row from the dataset, allowing you to add rows or modify the supplied row. We supply the row so that we have an example to work from. Full uses the whole original dataset.

-- Graham Williams [email protected] Sat, 06 Mar 2010 14:17:12 +1100

rattle (2.5.23) stable; urgency=low

  • Catch "arules" error in converting data to transactions when baskets contain repeated items.

  • When data tab is executed, and so crs$rpart is reset to NULL, always remove the Draw/Rules button from the Tree option of the Predictive tab.

  • Add code to fix translations that are not being loaded when using RGtk2 on MS/Windows. All is okay on GNU/Linux, but RGtk2 seems not to get the right locale for loaded Glade file. The fix is to traverse the GUI and change all labels, on starting up Rattle. RGtk2 authors tried to fix but it remains an issue.

  • Ensure rpart is reset on resetting rattle.

  • Rework handling of tab pages because a Japanese translation on MS/Windows is having issues with the following call (nd=notebook) nb$getTabLabelText(nb$getNthPage(nb$getCurrentPage())) returning what looks like Shift-JIS encoding of the string rather than UTF-8, and hence not string matching the expected tab label.

  • Fix spelling errors on help menu and ensure help for all topics is covered.

  • For nnet, use MaxNWts=10000 (default is 1000) to allow larger nets by default, and capture the error message when this is exceeded and better explain what to do.

  • Ensure we don't export an empty dataset when choosing export on the data tab.

  • Capture arules error message when there are repeated items in one basket, and explain this more clearly.

  • For rpart use information as the default split rather than Gini - makes little if any difference.

  • Allow showHelpPlus to have an extra/alternative question that is displayed.

  • All random seeds should be 42.

  • Reset kmeans tab on loading a project.

  • Add dozen more weather stations to the weatherAUS dataset.

  • Improve the logic for the display of the Report radio buttons on the Evaluate tab.

  • Spelling correction to a number of tooltips.

-- Graham Williams [email protected] Wed, 03 Mar 2010 06:50:58 +1100

rattle (2.5.22) stable; urgency=low

  • Default window height is 650, but not forced so that the window nicely fills a netbook screen if maximised.

  • Bump R dependency to 2.8.0 in line with update of the CITATION file.

-- Graham Williams [email protected] Sat, 13 Feb 2010 09:48:00 +1100

rattle (2.5.21) stable; urgency=low

  • Re-enable gettext on MS/Windows, even though RGtk2 2.12.18 has not fixed the bindtextdomain problem with glade files and package supplied translations.

  • Change the tree plot to us "< =>" and ">= <" to clearly identify which branch the "=" results go. Could not figure out how to get expression to us a "ge" symbol.

  • Improve formatting of the PvO plots.

  • Use the pairs.panels function from the psych package for the default scatterplot on the Explore tab.

  • Add INSTALL file.

-- Graham Williams [email protected] Sun, 07 Feb 2010 15:03:22 +1100

rattle (2.5.20) stable; urgency=low

  • Restore missing weather.csv file.

  • Add to Google code: weather.R ChangeLog NEWS ToDo

-- Graham Williams [email protected] Sun, 31 Jan 2010 11:07:55 +1100

rattle (2.5.19) unstable; urgency=low

  • Ensure the right labels (Time/Risk rather than Class/Prob) displayed in filechooser when exporting a survival model.

  • Model tab renamed as Predictive.

  • Ensure boxplots have same "by ..." in the main title.

  • Update the weather dataset and include many more weather stations in the weatherAUS dataset.

  • Rtxt does no translations when running on MS/Windows (for now).

-- Graham Williams [email protected] Sat, 30 Jan 2010 09:28:18 +1100

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


5.4.0 by Graham Williams, a year ago

Browse source code at

Authors: Graham Williams [aut, cph, cre] , Mark Vere Culp [cph] , Ed Cox [ctb] , Anthony Nolan [ctb] , Denis White [cph] , Daniele Medri [ctb] , Akbar Waljee [ctb] (OOB AUC for Random Forest) , Brian Ripley [cph] (print.summary.nnet) , Jose Magana [ctb] (ggpairs plots) , Surendra Tipparaju [ctb] (initial RevoScaleR/XDF) , Durga Prasad Chappidi [ctb] (initial RevoScaleR/XDF) , Dinesh Manyam Venkata [ctb] (initial RevoScaleR/XDF) , Mrinal Chakraborty [ctb] (initial RevoScaleR/XDF) , Fang Zhou [ctb] (initial xgboost) , Cameron Chisholm [ctb] (risk plot on risk chart)

Documentation:   PDF Manual  

Task views: Machine Learning & Statistical Learning, Model Deployment with R

GPL (>= 2) license

Imports stats, utils, ggplot2, grDevices, graphics, magrittr, methods, stringi, stringr, tidyr, dplyr, XML, rpart.plot

Depends on tibble, bitops

Suggests pmml, colorspace, ada, amap, arules, arulesViz, biclust, cairoDevice, cba, cluster, corrplot, descr, doBy, e1071, ellipse, fBasics, foreign, fpc, gdata, ggdendro, ggraptR, gplots, grid, gridExtra, gtools, hmeasure, Hmisc, kernlab, Matrix, mice, nnet, party, plyr, psych, randomForest, RColorBrewer, readxl, reshape, rggobi, RGtk2, ROCR, RODBC, rpart, scales, SnowballC, survival, timeDate, tm, verification, wskm, xgboost

Depended on by mvc.

Suggested by pmml,

See at CRAN