Various R Programming Tools for Data Manipulation

Various R programming tools for data manipulation, including: - medical unit conversions ('ConvertMedUnits', 'MedUnits'), - combining objects ('bindData', 'cbindX', 'combine', 'interleave'), - character vector operations ('centerText', 'startsWith', 'trim'), - factor manipulation ('levels', 'reorder.factor', 'mapLevels'), - obtaining information about R objects ('object.size', 'elem', 'env', 'humanReadable', 'is.what', 'll', 'keep', 'ls.funs', 'Args','nPairs', 'nobs'), - manipulating MS-Excel formatted files ('read.xls', 'installXLSXsupport', 'sheetCount', 'xlsFormats'), - generating fixed-width format files ('write.fwf'), - extricating components of date & time objects ('getYear', 'getMonth', 'getDay', 'getHour', 'getMin', 'getSec'), - operations on columns of data frames ('matchcols', 'rename.vars'), - matrix operations ('unmatrix', 'upperTriangle', 'lowerTriangle'), - operations on vectors ('case', 'unknownToNA', 'duplicated2', 'trimSum'), - operations on data frames ('frameApply', 'wideByFactor'), - value of last evaluated expression ('ans'), and - wrapper for 'sample' that ensures consistent behavior for both scalar and vector arguments ('resample').


New features:

  • Add new argument 'byrow' to upperTriangle(), lowerTriangle(), upperTriangle<-(), and lowerTriangle<-() to specify by-row rather by-column order. This makes it simpler to copy values between the lower and upper triangular areas, e.g. to construct a symmetric matrix.

Other changes:

  • Add inline comments to tests to alert reviewers of expected diffs on systems lacking the libraries for read.xls() to support XLSX formatted files.

Bug fixes:

  • mapLevels() no longer generates warnings about conversion of lists to vectors.

Other changes:

  • Requirement for Perl version 5.10.0 or later is specified in the package DESCRITION.

  • first() and last() are now simply wrappers for calls to 'head(x, n=1)' and 'tail(x, n=1)', respectively.

New features:

  • New functions first() and last() to return the first or last element of an object.

  • New functions left() and right() to return the leftmost or rightmost n (default to 6) columns of a matrix or dataframe.

  • New 'scientific' argument to write.fwf(). Set 'scientific=FALSE' to prevent numeric columns from being displayed using scientific notification.

  • The 'standard' argument to humanReadable() now accepts three values, 'SI' for base 1000 ('MB'), 'IEC' for base 1024 ('MiB'), and 'Unix' for base 1024 and single-character units ('M')

  • object.size() now returns objects with S3 class 'object_sizes' (note the final 's') to avoid conflicts with methods in utils for class 'object_size' provided by package 'utils' which can only handle a scalar size.

  • New 'units' argument to humanReadable()--and hence to print.object_sizes() and format.object_sizes()--that permits specifying the unit to use for all values. Use 'bytes' to display all values with the unit 'bytes', use 'auto' (or leave it missing) to automatically select the best unit, and use a unit from the selected standard to use that unit (i.e. 'MiB').

  • The default arguments to humanReadable() have changed. The defaults are now 'width=NULL' and 'digits=1', so that the default behavior is now to show digit after the decimal for all values.

Bug fixes:

  • reorder.factor() was ignoring the argument 'X' unless 'FUN' was supplied, making it incompatible with the behavior of stats:::reorder.default(). This has been corrected, so that calling reorder on a factor with arguments 'X' and/or 'FUN' should now return the same results whether gdata is loaded or not. (Reported by Sam Hunter.)

  • write.fwf() now properly supports matrix objects, including matrix objects without column names. (Reported by Carl Witthoft.)

Other changes:

  • Replaced deprecated PERL function POSIX::isdigit in (which is used by read.xls() ) with an equivalent regular expression. (Reported by both Charles Plessy, Gerrit-jan Schutten, and Paul Johnson. Charles also provided a patch to correct the issue.)

  • aggregate.table(), which has been defunct gdata 2.13.3 (2014-04-04) has now been completely removed.

Bug Fixes:

  • read.xls() can now properly process XLSX files with up to 16385 columns (the maximum generated by Microsoft Excel).

  • read.xls() now properly handles XLS/XLSX files that use 1904-01-01 as the reference value for dates instead of 1900-01-01 (the default for MS-Excel files created on the Mac).

Other changes:

  • Updated perl libraries and code underlying read.xls() to the latest version, including switching from Spreadsheet::XLSX to Spreadsheet::ParseXLSX.

Bug Fixes:

  • Unit tests were incorrectly checking for equality of optional POSIXlt components. (Bug reported by Brian Ripley).

Other Changes:

  • 'aggregate.table' is now defunct. See '?gdata-defunct' for details.

  • Unit tests and vignettes now follow R standard practice.

  • Minor changes to clean up R CMD check warnings.


  • Simplify ll() by converting a passed list to an environment, avoiding the need for special casing and the use of attach/detach.

  • Working of deprecation warning message in aggregate.table clarified.


  • Replaced calls to depreciated function ".path.package" with the new public function "path.package".

New features:

  • New 'duplicated2' function which returns TRUE for all elements that are duplicated, including the first, contributed by Liviu Andronic. This differs from 'duplicated', which only returns the second and following (second-to last and previous when 'fromLast=TRUE') duplicate elements.

  • New 'ans' functon to return the value of the last evaluated top-level function (a convenience function for accessing .Last.value), contributed by Liviu Andonic.

Bug Fixes:

  • On windows, warning messages printed to stdout by perl were being included in the return value from 'system', resulting in errors in 'sheetCount' and 'sheetNames'. Corrected.

  • The 'MedUnits' column names 'SIUnits' and 'ConventionalUnits' were reversed and misspelled.

Other Changes:

  • 'stats::aggregate' was made into a generic on 27-Jan-2010, so that attempting to call 'aggregate' on a 'table' object will now incorrectly call 'aggregate.table'. Since 'aggregate.table' can be replaced by a call to tapply using two index vectors, e.g. aggregate.table(x, by1=a, by2=b, mean) can be replaced by tapply(x, INDEX=list(a, b), FUN=mean), the 'aggregate.table' function will now display a warning that it is depreciated and recommending the equivalent call to tapply. It will be removed entirely in a future version of gdata.


  • read.xls() now supports fileEncoding argument to allow non-ascii encoded data to be handled. See the manual page for an example.

Bug Fixes:

  • The perl script utilized by read.xls() was incorrectly appending a space character at the end of each line, causing problems with character and NA entries in the final column.

New Features:

  • read.xls() and supporting functions now allow blank lines to be preserved, rather than skipped, by supplying the argument "blank.lines.skip=FALSE". The underlying perl function has been extended to suppor this via an optional "-s" argument which, when present, preserves blank lines during the conversion. (The default behavior remains unchanged.)

Other Changes:

  • Add SystemRequirements field specifying that perl is necessary for gdata to function fully.

Bug fixes:

  • gdata::nobs.default() needs to handle logical vectors in addition to numeric vectors.

Bug fixes:

  • Mark example for installXLSsupport() as dontrun so R CMD check won't fail on systems where PERL is not fully functional.

  • Correct name of installXLSsupport() in tests/

Other Changes:

  • Add dependency on R 2.13.0, since that is when stats::nobs appeared.

Bug fixes:

  • Fix issues in nobs.default identified in testing with the gmodels package.

Bug fixes:

  • Undo removal of 'nobs' and 'nobs.lm'. Instead define aliases for 'nobs' and 'nobs.lm' to support backward compatibility for packages depending on gdata.

New features:

  • New ls.funs() function to list all objects of class function in the specified environment.

  • New startsWith() function to determine if a string "starts with" the specified characters.


  • Add 'na.strings' argument to read.xls() to convert Excel's '#DIV/0!' to NA.

Bug fixes:

  • Correct various R CMD check warnings

Other changes:

  • Base S3 method for nobs() and nobs.lm() method removed since these are now provided in the stats package.

New features:

  • Add centerText() function to center text strings for a specified width.

  • Add case() function, a vectorized variant of the base::switch() function, which is useful for converting numeric codes into factors.


  • Minor improvements to xls2csv() man page.


  • nPairs() gains a summary method that shows how many times each variable is known, while the other variable of a pair is not

Bug fixes:

  • Fix errors on windows when R or Perl install path includes spaces by properly quoting the path.


  • Minor improvement to Args(), read.xls() man page.

Bug fixes:

  • Modify write.fwf() to capture and pass on additional arguments for write.table(). This resolves a bug reported by Jan Wijffels.

  • Modify xls2sep.R to avoid use of file.access() which is unreliable on Windows network shares.


  • When loaded, gtools (via an .onAttach() function) now checks:

    1. if perl is available
    2. whether the perl libraries for XLS support are available
    3. whether the perl libraries for XLSX support are available

    If perl is not available, an appropriate warning message is displayed.

    If necessary perl libraries are not available, a warning message is displayed, as is a message suggesting the user run the (new) installXLSXsupport() function to attempt to install the necessary perl libraries.

  • The function installXLSXsupport() has been provided to install the binary perl modules that read.xls needs to support Excel 2007+ 'XLSX' files.


  • New xlsFormats() command to determine which Excel formats are supported (XLS, XLSX).

Bug Fixes:

  • No longer attempt to install perl modules Compress::Raw::Zlib and Spreadsheet::XLSX at build/compile time. This should resolve recent build issues, particularly on Windows.

  • All perl code can now operate (but generate warnings) when perl modules Compress::Raw::Zlib and Spreadsheet::XLSX when are not installed.

  • Also update Greg's email address.


  • on Windows attempts to locate ActiveState perl if perl= not specified and Rtools perl would have otherwise been used in read.xls and other perl dependent functions.

Bug Fixes:

  • Fix building of Perl libraries on Win32


  • read.xls() now supports Excel 2007 'xlsx' files.

  • read.xls() now allows specification of worksheet by name

  • read.xls() now supports ftp URLs.

  • Improved ll() so user can limit output to specified classes

New Functions:

  • sheetCount() and sheetNames() to determine the number and names of worksheets in an Excel file, respectively.

Bug Fixes:

  • Fix formatting warning in frameApply().

  • Resolve crash of "ll(.GlobalEnv)"

Bug Fixes

  • Modify unit tests to avoid issues related to time zones.

Bug Fixes

  • Correct minor typos & issues in man pages for write.fwf(), resample() (Greg Warnes)

  • Correct calculation of object sizes in env() and ll() (Gregor Gorjanc)

New Features

  • Add support for using tab for field separator during translation from xls format in read.xls (Greg Warnes)

  • Enhanced function object.size that returns the size of multiple objects. There is also a handy print method that can print size of an object in "human readable" format when options(humanReadable=TRUE) or print(object.size(x), humanReadable=TRUE). (Gregor Gorjanc)

  • New function wideByFactor that reshapes given dataset by a given factor - it creates a "multivariate" data.frame. (Gregor Gorjanc)

  • New function nPairs that gives the number of variable pairs in a data.frame or a matrix. (Gregor Gorjanc)

  • New functions getYear, getMonth, getDay, getHour, getMin, and getSec for extracting the date/time parts from objects of a date/time class. (Gregor Gorjanc)

  • New function bindData that binds two data frames into a multivariate data frame in a different way than merge. (Gregor Gorjanc)

Other Changes

  • Correct Greg's email address
  • New function .runRUnitTestsGdata that enables run of all RUnit tests during the R CMD check as well as directly from within R.

  • Enhanced function object.size that returns the size of multiple objects. There is also a handy print method that can print size of an object in "human readable" format when options(humanReadable=TRUE) or print(x, humanReadable=TRUE).

  • New function bindData that binds two data frames into a multivariate data frame in a different way than merge.

  • New function wideByFactor that reshapes given dataset by a given factor - it creates a "multivariate" data.frame.

  • New functions getYear, getMonth, getDay, getHour, getMin, and getSec for extracting the date/time parts from objects of a date/time class.

  • New function nPairs that gives the number of variable pairs in a data.frame or a matrix.

  • New function trimSum that sums trimmed values.

  • New function cbindX that can bind objects with different number of rows.

  • write.fwf gains the width argument. The value for unknown can increase or decrease the width of the columns. Additional tests and documentation fixes.

  • Enhancements and bug fixes for read.xls() and xls2csv():

    • More informative log messages when verbose=TRUE

    • File paths containing spaces or other non-traditional characters are now properly handled

    • Better error messages, particularly when perl fails to generate an output .csv file.

    • The 'shortcut' character "~" (meaning user's home directory) is now properly handled in file paths.

    • XLS files created by OpenOffice are now properly handled. Thanks to Robert Burns for pointing out the patch (

  • Update perl libraries needed by xls2csv() and read.xls() to latest available versions on CRAN.

  • Add read.xls() to exported function list

  • Correct iris.xls example file. It didn't contain the complete & properly formatted iris data set. Fixed.

  • Fix typo in win32 example for read.xls()

  • The keep() function now includes an 'all' argument to specify how objects with names starting with '.' are handled.

  • keep() now shows an informative warning message when a requested object does not exist

  • New vignette "Mapping Levels of a Factor" describing the use of mapLevels().

  • New vignette "Working with Unknown Values" describing the use of isUnknown() and unknownToNA().

  • Several enhancements to read.xls() (thanks to Gabor Grothendieck):

    • New function xls2csv(), which handles converting an xls file to a csv file and returns a connection to the temporary csv file

    • xls2csv() and read.xls() both allow a file or a url to be specified

    • read.xls() has a new 'pattern' argument which, if supplied, will ignore everything prior to the first line in th csv file that matches the pattern. This is typically used if there are a variable number of comment lines prior to the header in which case one can specify one of the column headings as the pattern. read.xls should be compatible with the old read.xls.

  • Minor fixes to drop.levels(), is.what().

  • Implementation of unit tests for most functions.

  • Arguments as well as their position of reorder.factor have been changed to conform with reorder.factor method in stats package, due to collision bug. Argument 'make.ordered' is now 'order' and old argument 'order' is now 'new.order'! Therefore, you have to implicitly specify new.order i.e.

    reorder(trt, new.order=c("PLACEBO", "300 MG", "600 MG", "1200 MG"))

  • trim() gains ... argument.

  • Added "unknown" methods for matrices.

  • Added c() method for factors based on mapLevels() functions.

  • Added write.fwf, which writes file in Fixed Width Format.

  • Added mapLevels(), which produces a map with information on levels and/or internal integer codes. Contributed by Gregor Gorjanc.

  • Extended dropLevels() to work on the factors contained in a data frame, as well as individual factors.

  • Add unknown(), which changes given unknown value to NA and vice versa. Contributed by Gregor Gorjanc.

  • Extended trim() to handle a variety of data types data.frames, lists, factors, etc. Code changes contributed by Gregor Gorjanc.

  • Added resample() command that acts like sample() except that it always samples from the arguments provided, even if only a single argument is present. This differs from sample() which behaves differently in this case.

  • Updated my email address.

  • Fixed bug in interleave.R - option to covert 1-column matrices to vector (based on Andrew Burgess's suggestion)

  • Updated Greg and Jim's email adresses

  • ll.R: Suppressed warning message in attach() call.

  • frameApply.Rd, reorder.Rd: Remove explicit loading of gtools in examples, so that failure to import functions from gtools gets properly caught by running the examples.

  • upperTriangle.R, man/upperTriangle.Rd: Add functions for extracting and modifying the upper and lower trianglular components of matrices.

  • is.what.R: Replaced the "not.using" vector with a more robust try(get(test)) to find out whether a particular is.* function returns a logical of length one.

  • DESCRIPTION: Added Suggests field

  • Updated the example in frameApply

  • Added DESCRIPTION and removed

  • Updated ll.Rd documentation

  • Fixed bug in Args.R, is.what.R, ll.R

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


2.17.0 by Gregory R. Warnes, 2 years ago

Browse source code at

Authors: Gregory R. Warnes, Ben Bolker, Gregor Gorjanc, Gabor Grothendieck, Ales Korosec, Thomas Lumley, Don MacQueen, Arni Magnusson, Jim Rogers, and others

Documentation:   PDF Manual  

GPL-2 license

Imports gtools, stats, methods, utils

Suggests RUnit

System requirements: perl (>= 5.10.0)

Imported by BioGeoBEARS, DAMisc, Dominance, ENiRG, Ecfun, LOST, MIPHENO, NetComp, Plasmidprofiler, QTLRel, RKEEL, RbioRXN, TSmisc, adapr, comato, drLumi, elementR, enaR, fheatmap, gmodels, gplots, hddtools, likeLTD, lmms, mglR, minval, polySegratio, powerbydesign, qat, qdap, rAvis, rioja, rtematres, simSummary, sisus, sp500SlidingWindow, vardpoor, wux.

Depended on by COMBIA, ImportExport, LeLogicielR, MixedDataImpute, NanoStringNorm, REREFACT, TRSbook, cccrm, chromoR, compareGroups, genetics, iCluster, mixexp, mpMap, nCal, nonparaeff, pSI, weights.

Suggested by HistogramTools, MetaQC, MonetDBLite, ProjectTemplate, SII, SMIR,, bst, data.table, ipdw, kernscr, nmfgpu4R, plotMCMC, poliscidata, popReconstruct, primer, rattle, rfigshare, scape, solarius, sorvi, taRifx, umx.

See at CRAN