Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 101 packages in 0.01 seconds

RANN — by Gregory Jefferis, 3 years ago

Fast Nearest Neighbour Search (Wraps ANN Library) Using L2 Metric

Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount's ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and 'bd' as well as 'kd' trees. The distance is computed using the L2 (Euclidean) metric. Please see package 'RANN.L1' for the same functionality using the L1 (Manhattan, taxicab) metric.

causalDisco — by Anne Helby Petersen, 8 days ago

Tools for Causal Discovery on Observational Data

Various tools for inferring causal models from observational data. The package includes an implementation of the temporal Peter-Clark (TPC) algorithm. Petersen, Osler and Ekstrøm (2021) . It also includes general tools for evaluating differences in adjacency matrices, which can be used for evaluating performance of causal discovery procedures.

sampleSelection — by Arne Henningsen, a year ago

Sample Selection Models

Two-step and maximum likelihood estimation of Heckman-type sample selection models: standard sample selection models (Tobit-2), endogenous switching regression models (Tobit-5), sample selection models with binary dependent outcome variable, interval regression with sample selection (only ML estimation), and endogenous treatment effects models. These methods are described in the three vignettes that are included in this package and in econometric textbooks such as Greene (2011, Econometric Analysis, 7th edition, Pearson).

dataReporter — by Claus Thorn Ekstrøm, 6 months ago

Reproducible Data Screening Checks and Report of Possible Errors

Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 for more information.

dataMaid — by Claus Thorn Ekstrøm, 7 months ago

A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process

Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.

PCADSC — by Anne H. Petersen, 5 years ago

Tools for Principal Component Analysis-Based Data Structure Comparisons

A suite of non-parametric, visual tools for assessing differences in data structures for two datasets that contain different observations of the same variables. These tools are all based on Principal Component Analysis (PCA) and thus effectively address differences in the structures of the covariance matrices of the two datasets. The PCASDC tools consist of easy-to-use, intuitive plots that each focus on different aspects of the PCA decompositions. The cumulative eigenvalue (CE) plot describes differences in the variance components (eigenvalues) of the deconstructed covariance matrices. The angle plot presents the information loss when moving from the PCA decomposition of one dataset to the PCA decomposition of the other. The chroma plot describes the loading patterns of the two datasets, thereby presenting the relative weighting and importance of the variables from the original dataset.

move — by Bart Kranstauber, 5 months ago

Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

ade4 — by Aurélie Siberchicot, a month ago

Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) .

fdadensity — by Alexander Petersen, 2 years ago

Functional Data Analysis for Density Functions by Transformation to a Hilbert Space

An implementation of the methodology described in Petersen and Mueller (2016) for the functional data analysis of samples of density functions. Densities are first transformed to their corresponding log quantile densities, followed by ordinary Functional Principal Components Analysis (FPCA). Transformation modes of variation yield improved interpretation of the variability in the data as compared to FPCA on the densities themselves. The standard fraction of variance explained (FVE) criterion commonly used for functional data is adapted to the transformation setting, also allowing for an alternative quantification of variability for density data through the Wasserstein metric of optimal transport.

dbscan — by Michael Hahsler, 4 months ago

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) .