Time Aware Tibbles

Built on top of the 'tibble' package, 'tibbletime' is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time-based subsetting on tibbles, quickly summarising and aggregating results by time periods, and creating columns that can be used as 'dplyr' time-based groups.


Travis-CI Build Status CRAN_Status_Badge codecov

A time aware tibble

Built on top of the tidyverse, tibbletime is an extension that allows for the creation of time aware tibbles through the setting of a time index.

Some immediate advantages of this include:

  1. Performing compact time-based subsetting on tibbles.

  2. Partitioning an index column by time (like yearly, monthly, every 2 weeks, etc.) so that you can use dplyr's grouped functionality to summarise and aggregate by time period.

  3. Changing the periodicity of a time-based tibble. This allows easily changing from a daily dataset to a monthly or yearly dataset.

  4. Easily working with the pipe and packages like dplyr and tidyr to make for a seamless experience with time series and the tidyverse. Each function has also been designed to work with dplyr::group_by() allowing for powerful data manipulation.

  5. Modifying functions for rolling analysis.

  6. Creating tbl_time time series objects quickly.

  7. Using fully supported Date and POSIXct index columns, along with experimental support for yearmon, yearqtr and hms which should become more stable as some issues in dplyr are worked out.

Installation

Development Version:

devtools::install_github("business-science/tibbletime")

CRAN Version:

install.packages("tibbletime")

Major update warning

If you have been using 0.0.2, the update to 0.1.0 has introduced major breaking changes. This was necessary for long term stability of the package, and no attempt to support backwards compatability was made at this early stage in development. We apologize for any issues this causes. See NEWS for complete details.

Getting started

The first thing to do is to turn your tibble into a tbl_time object. Notice the specification of the index as the date column of FB.

library(tibbletime)
library(dplyr)
 
# Facebook stock prices. Comes with the package
data(FB)
 
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
 
FB
#> # A time tibble: 1,008 x 8
#> # Index: date
#>    symbol date        open  high   low close    volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
#>  1 FB     2013-01-02  27.4  28.2  27.4  28.0  69846400     28.0
#>  2 FB     2013-01-03  27.9  28.5  27.6  27.8  63140600     27.8
#>  3 FB     2013-01-04  28.0  28.9  27.8  28.8  72715400     28.8
#>  4 FB     2013-01-07  28.7  29.8  28.6  29.4  83781800     29.4
#>  5 FB     2013-01-08  29.5  29.6  28.9  29.1  45871300     29.1
#>  6 FB     2013-01-09  29.7  30.6  29.5  30.6 104787700     30.6
#>  7 FB     2013-01-10  30.6  31.5  30.3  31.3  95316400     31.3
#>  8 FB     2013-01-11  31.3  32.0  31.1  31.7  89598000     31.7
#>  9 FB     2013-01-14  32.1  32.2  30.6  31.0  98892800     31.0
#> 10 FB     2013-01-15  30.6  31.7  29.9  30.1 173242600     30.1
#> # ... with 998 more rows

There are a number of functions that were designed specifically for tbl_time objects. Some of them are:

  1. filter_time() - Succinctly filter a tbl_time object by date.

  2. as_period() - Convert a tbl_time object from daily to monthly, from minute data to hourly, and more. This allows the user to easily aggregate data to a less granular level.

  3. collapse_by() - Take an tbl_time object, and collapse the index so that all observations in an interval share the same date. The most common use of this is to then group on this column with dplyr::group_by() and perform time-based calculations with summarise(), mutate() or any other dplyr function.

  4. collapse_index() - A lower level version of collapse_by() that directly modifies the index column and not the entire tbl_time object. It allows the user more flexibility when collapsing, like the ability to assign the resulting collapsed index to a new column.

  5. rollify() - Modify a function so that it calculates a value (or a set of values) at specific time intervals. This can be used for rolling averages and other rolling calculations inside the tidyverse framework.

  6. create_series() - Use shorthand notation to quickly initialize a tbl_time object containing a regularly spaced index column of class Date, POSIXct, yearmon, yearqtr or hms.

To look at just a few:

# Filter for dates from March 2013 to December 2015
FB %>% 
  filter_time('2013-03' ~ '2015')
#> # A time tibble: 716 x 8
#> # Index: date
#>    symbol date        open  high   low close   volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#>  1 FB     2013-03-01  27.0  28.1  26.8  27.8 54064800     27.8
#>  2 FB     2013-03-04  27.8  28.1  27.4  27.7 32400700     27.7
#>  3 FB     2013-03-05  27.9  28.2  27.2  27.5 40622200     27.5
#>  4 FB     2013-03-06  28.1  28.1  27.4  27.5 33532600     27.5
#>  5 FB     2013-03-07  27.6  28.7  27.5  28.6 74540200     28.6
#>  6 FB     2013-03-08  28.4  28.5  27.7  28.0 44198900     28.0
#>  7 FB     2013-03-11  28.0  28.6  27.8  28.1 35642100     28.1
#>  8 FB     2013-03-12  28.1  28.3  27.6  27.8 27569600     27.8
#>  9 FB     2013-03-13  27.6  27.6  26.9  27.1 39619500     27.1
#> 10 FB     2013-03-14  27.1  27.4  26.8  27.0 27646400     27.0
#> # ... with 706 more rows
 
# Change from daily to monthly periodicity
# This just reduces the tibble to the last row in each month
FB %>% 
  as_period("monthly", side = "end")
#> # A time tibble: 48 x 8
#> # Index: date
#>    symbol date        open  high   low close    volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
#>  1 FB     2013-01-31  29.2  31.5  28.7  31.0 190744900     31.0
#>  2 FB     2013-02-28  26.8  27.3  26.3  27.2  83027800     27.2
#>  3 FB     2013-03-28  26.1  26.2  25.5  25.6  28585700     25.6
#>  4 FB     2013-04-30  27.1  27.8  27.0  27.8  36245700     27.8
#>  5 FB     2013-05-31  24.6  25.0  24.3  24.4  35925000     24.4
#>  6 FB     2013-06-28  24.7  25.0  24.4  24.9  96778900     24.9
#>  7 FB     2013-07-31  38.0  38.3  36.3  36.8 154828700     36.8
#>  8 FB     2013-08-30  42.0  42.3  41.1  41.3  67735100     41.3
#>  9 FB     2013-09-30  50.1  51.6  49.8  50.2 100095000     50.2
#> 10 FB     2013-10-31  47.2  52.0  46.5  50.2 248809000     50.2
#> # ... with 38 more rows
 
# Maybe you don't want to lose the rest of the month's information,
# and instead you'd like to take the average of every column for each month
FB %>%
  select(-symbol) %>%
  collapse_by("monthly") %>%
  group_by(date) %>%
  summarise_all(mean)
#> # A time tibble: 48 x 7
#> # Index: date
#>    date        open  high   low close   volume adjusted
#>    <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#>  1 2013-01-31  30.2  30.8  29.8  30.3 79802462     30.3
#>  2 2013-02-28  28.3  28.6  27.7  28.1 50402095     28.1
#>  3 2013-03-28  26.9  27.2  26.5  26.8 36359025     26.8
#>  4 2013-04-30  26.6  27.0  26.2  26.6 33568600     26.6
#>  5 2013-05-31  26.4  26.6  25.9  26.1 44640673     26.1
#>  6 2013-06-28  24.0  24.3  23.7  23.9 39416575     23.9
#>  7 2013-07-31  27.7  28.2  27.4  27.9 65364414     27.9
#>  8 2013-08-30  38.7  39.3  38.2  38.7 61136095     38.7
#>  9 2013-09-30  45.5  46.3  44.9  45.8 79154190     45.8
#> 10 2013-10-31  50.7  51.5  49.7  50.5 88375435     50.5
#> # ... with 38 more rows
 
# Perform a 5 period rolling average
mean_5 <- rollify(mean, window = 5)
mutate(FB, roll_mean = mean_5(adjusted))
#> # A time tibble: 1,008 x 9
#> # Index: date
#>    symbol date        open  high   low close    volume adjusted roll_mean
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>     <dbl>
#>  1 FB     2013-01-02  27.4  28.2  27.4  28.0  69846400     28.0      NA  
#>  2 FB     2013-01-03  27.9  28.5  27.6  27.8  63140600     27.8      NA  
#>  3 FB     2013-01-04  28.0  28.9  27.8  28.8  72715400     28.8      NA  
#>  4 FB     2013-01-07  28.7  29.8  28.6  29.4  83781800     29.4      NA  
#>  5 FB     2013-01-08  29.5  29.6  28.9  29.1  45871300     29.1      28.6
#>  6 FB     2013-01-09  29.7  30.6  29.5  30.6 104787700     30.6      29.1
#>  7 FB     2013-01-10  30.6  31.5  30.3  31.3  95316400     31.3      29.8
#>  8 FB     2013-01-11  31.3  32.0  31.1  31.7  89598000     31.7      30.4
#>  9 FB     2013-01-14  32.1  32.2  30.6  31.0  98892800     31.0      30.7
#> 10 FB     2013-01-15  30.6  31.7  29.9  30.1 173242600     30.1      30.9
#> # ... with 998 more rows
 
# Create a time series
# Every other day in 2013
create_series(~'2013', '2 day')
#> # A time tibble: 183 x 1
#> # Index: date
#>    date               
#>    <dttm>             
#>  1 2013-01-01 00:00:00
#>  2 2013-01-03 00:00:00
#>  3 2013-01-05 00:00:00
#>  4 2013-01-07 00:00:00
#>  5 2013-01-09 00:00:00
#>  6 2013-01-11 00:00:00
#>  7 2013-01-13 00:00:00
#>  8 2013-01-15 00:00:00
#>  9 2013-01-17 00:00:00
#> 10 2013-01-19 00:00:00
#> # ... with 173 more rows

Grouping

Groups created through dplyr::group_by() are supported throughout the package. Because collapse_index() is just adding a column you can group on, all dplyr functions are supported.

# Facebook, Amazon, Netflix and Google stocks
data(FANG)
 
# Summarise by period and by group
FANG %>% 
  as_tbl_time(date) %>%
  group_by(symbol) %>%
  
  # Collapse to yearly
  collapse_by("year") %>%
  
  # Additionally group by date (yearly)
  group_by(date, add = TRUE) %>%
  
  # Perform a yearly summary for each symbol
  summarise(
    adj_min   = min(adjusted),
    adj_max   = max(adjusted),
    adj_range = adj_max - adj_min
  )
#> # A time tibble: 16 x 5
#> # Index:  date
#> # Groups: symbol [?]
#>    symbol date       adj_min adj_max adj_range
#>    <chr>  <date>       <dbl>   <dbl>     <dbl>
#>  1 AMZN   2013-12-31   248     404       156  
#>  2 AMZN   2014-12-31   287     407       120  
#>  3 AMZN   2015-12-31   287     694       407  
#>  4 AMZN   2016-12-30   482     844       362  
#>  5 FB     2013-12-31    22.9    58.0      35.1
#>  6 FB     2014-12-31    53.5    81.4      27.9
#>  7 FB     2015-12-31    74.1   109        35.0
#>  8 FB     2016-12-30    94.2   133        39.1
#>  9 GOOG   2013-12-31   351     560       209  
#> 10 GOOG   2014-12-31   495     609       114  
#> 11 GOOG   2015-12-31   493     777       284  
#> 12 GOOG   2016-12-30   668     813       145  
#> 13 NFLX   2013-12-31    13.1    54.4      41.2
#> 14 NFLX   2014-12-31    44.9    69.2      24.3
#> 15 NFLX   2015-12-31    45.5   131        85.4
#> 16 NFLX   2016-12-30    82.8   128        45.6

Index order

tibbletime assumes that your dates are in ascending order. A warning will be generated if they are not when you use a function where order is relevant. We do this for speed purposes and to not force a change on the user's dataset by sorting for them.

Vignettes

  1. Time-based filtering

  2. Changing periodicity

  3. Rolling calculations

  4. Use with dplyr

News

tibbletime 0.1.1

  • General

    • collapse_index() and collapse_by() support a clean argument. This will round your index up/down to the next period boundary, allowing for prettier dates that can be used in summaries.

    • A new helper function, collapse_by(), wraps the common idiom of .tbl_time %>% mutate(date = collapse_index(date, "yearly")) and is the easiest way to use tibbletime with the rest of the tidyverse.

    • You can now pass an index vector (easily created with create_series()) to the period argument of functions like as_period() or collapse_by() to specify custom periods to collapse at.

    • Added support for millisecond and microsecond grouping. See ?create_series for examples and ?filter_time Details for more information.

    • partition_index() (and therefore higher level functions like collapse_by()) now round using the entire period argument to figure out the default start date. Meaning if 2 years is passed, it will round down the start of the series to the lower 2 year boundary, rather than just year. This is a small change, but is technically breaking.

    • More efficient parsing of periods. Only noticably faster with a large number of groups.

  • Bug Fixes

    • tidyr::gather() and tidyr::spread() now work appropriately.

tibbletime 0.1.0

This is a major update. It introduces a huge number of breaking changes as we heavily reworked the internals of the package. This should ensure the longevity of the package and provide maximum flexibility for its use with dplyr. As this was still early in package development with minimal usage, and because we had issued a Warning in the README of the last update that we may change things, we have not made any attempt to support backwards compatability. From this point forward, however, we will support backwards compatability as we feel that we have reached a more stable implementation.

With that out of the way, here is a complete list of changes.

  • General

    • The period argument no longer supports the 'period formula' (e.g. 1~year). It added unnecessary complication with little benefit. Rather, a character should be used like '1 year'. See the documentation of partition_index() for full details.

    • time_formula arguments still support the from ~ to style syntax, but the left and right hand sides must now be characters, rather than bare date specifications. In English, rather than 2013 ~ 2014, you must use '2013' ~ '2014'. This is easier to program with and also allows you to pass in variables to the time formula, which previously did not work well.

    • time_filter() has become filter_time(). This naming is easier to remember now that a suite of time_*() functions is not being developed and is easier to find with autocompletion.

    • time_group() and time_collapse() have become partition_index() and collapse_index(). Both functions accept index vectors and are commonly used inside dplyr::mutate().

    • partition_index() splits an index by period and returns an integer vector corresponding to the groups.

    • collapse_index() collapses an index by period so that all observations falling in that interval share the same date. This is most useful when used to then group on the index column.

    • There is full support for Date and POSIXct classes as the index, and there is experimental support for yearmon, yearqtr, and hms classes.

    • ceiling_index() and floor_index() are thin wrappers around lubridate functions of similar names, but they also work for yearmon, yearqtr and hms.

    • create_series() now has an explicit class argument.

    • as_period() gains an include_endpoints argument for including the last data point if side = "start" is specified or the first data point if side = "end" is used.

    • There are a number of new "getter" functions for accessing the index and time zone of tbl_time objects. These are useful for package development.

    • filter_time(), as_period() and other "getter" functions now use .tbl_time as a consistent first argument rather than x. collapse_index() and partition_index() use index as their first arguments.

    • Exported parse_period() for general use in other related packages.

    • Warnings are now generated if the user is not using a sorted index.

  • Bug Fixes

    • All dplyr functions should now retain the tbl_time class and relevant attributes.

    • Ensure that tidyr::spread() passes the fill argument through.

    • Default time zone is now UTC rather than Sys.timezone() to handle a daylight savings issue.

tibbletime 0.0.2

  • New functionality

    • time_floor() and time_ceiling() are convenient wrappers to lubridate functions for altering dates to period boundaries.

    • time_unnest() is used to specifically unnest a tibble object with a list-column of tbl_time objects.

    • create_series() allows the user to create a tbl_time object with a regularly spaced sequence of dates.

    • time_group() has become the workhorse function for creating time based groups used in changing periodicity and other grouped time based calculations.

    • time_summarise() and tmap() now also accept a formula-based period.

    • as_period() now accepts a formula-based period that provides an incredible amount of flexibility in creating groups. (#9, #14, #15)

    • rollify() creates a rolling version of any function for use in dplyr::mutate(). (#7)

  • General

    • You now have to explicitely load dplyr or tidyr to use any functions from those packages. Previously they were reexported, but this seems unnecessary.

    • Added vignettes on intro, filtering, and as_period().

    • Added more extensive dplyr support.

    • Speed increases for as_period() and create_series().

    • Internal global utilities moved to utils.R.

    • Added test coverage. (#2)

    • Added package documentation page. (#3)

    • Added versions to all imported packages.

  • Bug Fixes

    • Fixed an issue with [ in combination with tibble::add_column(). Use tibble (>= 1.3.4.9001) for correct behavior.

    • Fixed a bug where using tidyr::nest() would cause the nested tibbles to lose their time attributes.

    • Fix a bug where filter_time(data, ~yyyy-mm-dd) would be parsed as yyyy-mm-dd 00:00:00 ~ yyyy-mm-dd 00:00:00 instead of yyyy-mm-dd 00:00:00 ~ yyyy-mm-dd 23:59:59.

    • Fix a bug with as.Date / as.POSIXct operator collision in filter_time().

tibbletime 0.0.1

  • Initial release of tibbletime, a package for time aware tibbles.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("tibbletime")

0.1.1 by Davis Vaughan, 5 months ago


https://github.com/business-science/tibbletime


Report a bug at https://github.com/business-science/tibbletime/issues


Browse source code at https://github.com/cran/tibbletime


Authors: Davis Vaughan [aut, cre], Matt Dancho [aut]


Documentation:   PDF Manual  


Task views: Time Series Analysis


MIT + file LICENSE license


Imports assertthat, dplyr, glue, hms, lubridate, purrr, rlang, tibble, Rcpp, zoo

Suggests broom, covr, gapminder, knitr, testthat, tidyr

Linking to Rcpp


Imported by anomalize.

Suggested by tidyquant, tsbox.


See at CRAN