Stack and Reshape Datasets After Splitting Concatenated Values

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from reshape2 do not easily handle.


R functions to split concatenated data, stack columns of your datasets, and convert your data into different shapes.

  • cSplit: A core function that collects the functionality of several of the concat.split family of functions.
  • cSplit_f: A fast way to split columns of data where you know each row would result in the same number of values after being split.
  • concat.split: A set of functions to split strings where data have been concatenated into a single value, as is common when getting data collected with tools like Google Forms. (cSplit_l to return a list; and cSplit_e to return an "expanded" view of the input data.)
  • Stacked: A function to create a list of stacked sets of variables. Similar to melt from "reshape2", but doesn't put everything into one very long data.frame.
  • Reshape: A function to allow base R's reshape function to work with "unbalanced" datasets.
  • stratified: A function to take random row samples by groups.
  • getanID: A function for creating a secondary ID when duplicated "id" variables are present.
  • expandRows: "Expands" the rows of a dataset.
  • listCol_l and listCol_w: Unlists (long) or flattens (wide) a column in a data.frame or a data.table stored as a list.

The package is on CRAN. You can install it using:

install.packages("splitstackshape")

To install the development version, use:

library(devtools)
install_github("mrdwab/splitstackshape", ref = "devel")

Current version: 1.4.2

News

splitstackshape NEWS


Author/Maintainer : Ananda Mahto Email : ananda@mahto.info URL : http://github.com/mrdwab/splitstackshape BugReports : http://github.com/mrdwab/splitstackshape/issues


23 October 2014

  • listCol_l and listCol_w added as utilities for unlisting or flattening columns stored as lists in data.frames and data.tables.

18 October 2014

Bug in :::.stripWhite when using "|" as a delimiter fixed.

13 October 2014

See 1.3.0 -- 1.3.8 for details of changes.

cSplit now replaces concat.split.compact and concat.split.multiple in concat.split; cSplit_f has been introduced as a related function. Other new functions are stratified and expandRows.

12 October 2014

  • cSplit_f

    The "_f" is both representative of fread, which this function uses to split the concatenated cells, and "fixed"", which is indicative of the fact that this function would only work if the number of resulting columns is the same for each row in the input.

  • expandRows

    "Expand" the rows of a data.frame or a data.table either by values specified in a column of the input dataset or by a vector specifying the number of times to repeat each row.

  • Reshape, Stacked, and merged.stack now try to guess the "id.vars" values based on the values in "var.stubs". The values can still be specified manually.

08/10 October 2014

Incremental cleanups and additions to get ready for V1.4.0

  • concat.split.compact and concat.split.multiple are now simply wrappers for cSplit and no longer use :::read.concat to split up the values.
  • concat.split.expanded and concat.split.list now made data.table compatable.
  • concat.split.list and concat.split.expanded given short name forms (cSplit_l and cSplit_e).

Added functions:

  • cSplit

    Before the release of 1.4.0, the basic concat.split* functions would become simple wrappers for cSplit, which is much more efficient than the previous implementations. The earlier functions will remain for compatability purposes. Since cSplit is already in use, it will be an exported function.

  • stratified

    A function to take fixed or proportional samples by group from a data.frame or `data.table.

Non-exported additions:

  • :::.collapseMe
  • :::.stripWhite
  • :::Names
  • :::trim
  • :::vGrep

27 October 2013

  • Due to changes resulting from the introduction of numMat and charMat, concat.split.expanded and concat.split now have an additional argument, type, which takes a value of either "numeric" or "character". It is set to a default of type = "numeric" in the case of concat.split.expanded and type = NULL in the case of concat.split.

Added functions:

  • :::numMat

    numMat replaces binaryMat and valueMat for numeric data.

  • :::charMat

    charMat replaces charBinaryMat for string data.

Dropped functions:

Due to changes introduced after recommendations by @flodel, the following functions have been rewritten as numMat and charMat

  • :::binaryMat
  • :::valueMat
  • :::charBinaryMat

20 October 2013

New function added:

  • :::charBinaryMat

    concat.split.expanded did not previously support expanding "character" data. Due to prompting by @juba, charBinaryMat has been included to handle such cases.

27 August 2013

  • Further refinement of Stacked and merge.stack. merge.stack is now faster than Reshape, at least for large datasets.

18 August 2013

  • Stacked and merge.stack now made MUCH faster using almost a pure data.table solution.

17 August 2013

  • When Stacked results in a list of length 1, it is "unlisted" before being returned.
  • Reshape (and as a result, concat.split.multiple(..., direction = "long")) has been enhanced by the addition of a feature to automatically add an ID variable if the present "IDs" are not unique.

New functions added:

  • getanID
  • :::Names

16 August 2013

  • read.concat updated to use count.fields to determine the correct number of columns that the resulting data.frame should have.
  • Reshape now has an option to remove the rownames from the output, set to TRUE by default.

12 August 2013

Initial commit of splitstacshape with the following main functions:

  • concat.split (plus: concat.split.compact, concat.split.expanded, concat.split.list, and concat.split.multiple) -- To split concatenated data into more manageable data formats.
  • Reshape -- To help base R's reshape function handle unbalanced data and simplify the reshape syntax (wide to long only).
  • Stacked -- To selectively stack columns of a data.frame.

Non-exported functions are indicated with ::: before their names.

  • concat.split.compact
  • concat.split.expanded
  • concat.split.list
  • concat.split.multiple
  • concat.split
  • merged.stack
  • Reshape
  • Stacked
  • :::binaryMat
  • :::FacsToChars
  • :::NoSep
  • :::othernames
  • :::read.concat
  • :::valueMat

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("splitstackshape")

1.4.2 by Ananda Mahto, 3 years ago


http://github.com/mrdwab/splitstackshape


Report a bug at http://github.com/mrdwab/splitstackshape/issues


Browse source code at https://github.com/cran/splitstackshape


Authors: Ananda Mahto


Documentation:   PDF Manual  


GPL-3 license


Depends on data.table


Imported by rodham.


See at CRAN