data.table — by Matt Dowle, 6 months ago

Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, a fast friendly file reader and parallel file writer. Offers a natural and flexible syntax, for faster development.

bikeshare14 — by Arunkumar Srinivasan, a year ago

Bay Area Bike Share Trips in 2014

Anonymised Bay Area bike share trip data for the year 2014. Also contains additional metadata on stations and weather.

PGRdup — by J. Aravind, 4 months ago

Discover Probable Duplicates in Plant Genetic Resources Collections

Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.

nose — by Sumathi R, 5 years ago

nose Package for R

The nose package consists of a collection of three functions for classifying sparseness in typical 2 x 2 data sets with at least one cell should have zero count. These functions are based on the three widely applied summary measures for 2 x 2 categorical data viz, Risk Difference (RD), Relative Risk (RR), Odds Ratio (OR). This package helps to identify suitable continuity correction for zero cells when a multi centre analysis or a meta analysis is carried out. Further, it can be considered as a tool for sensitivity analysis for adding a continuity correction and to identify the presence of Simpson's paradox.