Found 8 packages in 0.01 seconds
Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Bay Area Bike Share Trips in 2014
Anonymised Bay Area bike share trip data for the year 2014. Also contains additional metadata on stations and weather.
PHATE - Potential of Heat-Diffusion for Affinity-Based Transition Embedding
PHATE is a tool for visualizing high dimensional single-cell data with natural progressions or trajectories. PHATE uses a novel conceptual framework for learning and visualizing the manifold inherent to biological systems in which smooth transitions mark the progressions of cells from one state to another. To see how PHATE can be applied to single-cell RNA-seq datasets from hematopoietic stem cells, human embryonic stem cells, and bone marrow samples, check out our preprint on bioRxiv at < http://biorxiv.org/content/early/2017/03/24/120378>.
nose Package for R
The nose package consists of a collection of three functions for classifying sparseness in typical 2 x 2 data sets with at least one cell should have zero count. These functions are based on the three widely applied summary measures for 2 x 2 categorical data viz, Risk Difference (RD), Relative Risk (RR), Odds Ratio (OR). This package helps to identify suitable continuity correction for zero cells when a multi centre analysis or a meta analysis is carried out. Further, it can be considered as a tool for sensitivity analysis for adding a continuity correction and to identify the presence of Simpson's paradox.
Compose Interoperable Analysis Pipelines & Put Them in Production
Enables data scientists to compose pipelines of analysis which consist of data manipulation, exploratory analysis & reporting, as well as modeling steps. Data scientists can use tools of their choice through an R interface, and compose interoperable pipelines between R, Spark, and Python. Credits to Mu Sigma for supporting the development of the package. Note - To enable pipelines involving Spark tasks, the package uses the 'SparkR' package. The SparkR package needs to be installed to use Spark as an engine within a pipeline. SparkR is distributed natively with Apache Spark and is not distributed on CRAN. The SparkR version needs to directly map to the Spark version (hence the native distribution), and care needs to be taken to ensure that this is configured properly. To install SparkR from Github, run the following command if you know the Spark version: 'devtools::install_github('apache/[email protected]', subdir='R/pkg')'. The other option is to install SparkR by running the following terminal commands if Spark has already been installed: '$ export SPARK_HOME=/path/to/spark/directory && cd $SPARK_HOME/R/lib/SparkR/ && R -e "devtools::install('.')"'.
A Rules Engine Based on 'Drools'
An interface for using the popular Java based Drools, which is a Business Rule Management System (See < https://www.drools.org> for more information). This package provides data scientists an intuitive interface to execute business rules on datasets for the purpose of analysis or designing intelligent systems, while leveraging the Drools rule engine. Rules written in DRL format accepted natively by Drools can also be executed through an R interface. Credits to Mu Sigma for their continued support throughout the development of the package.
Seed Germination Indices and Curve Fitting
Provides functions to compute various germination indices such as germinability, median germination time, mean germination time, mean germination rate, speed of germination, Timson's index, germination value, coefficient of uniformity of germination, uncertainty of germination process, synchrony of germination etc. from germination count data. Includes functions for fitting cumulative seed germination curves using four-parameter hill function and computation of associated parameters. See the vignette for more, including full list of citations for the methods implemented.
Discover Probable Duplicates in Plant Genetic Resources Collections
Provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using 'passport databases' comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.