Found 108 packages in 0.01 seconds
Fast 1d Bin Packing
Implements the First Fit Decreasing algorithm to achieve one dimensional heuristic bin packing. Runtime is of order O(n log(n)) where n is the number of items to pack. See "The Art of Computer Programming Vol. 1" by Donald E. Knuth (1997, ISBN: 0201896834) for more details.
Measuring Association with Recursive Binning
An iterative implementation of a recursive binary partitioning algorithm to measure pairwise dependence with a modular design that allows user specification of the splitting logic and stop criteria. Helper functions provide suggested versions of both and support visualization and the computation of summary statistics on final binnings. For a thorough discussion and demonstration of the algorithm, see Salahub and Oldford (2025)
Construction of Regular and Irregular Histograms with Different Options for Automatic Choice of Bins
Automatic construction of regular and irregular histograms as described in Rozenholc/Mildenberger/Gather (2010).
Binning Variables to Use in Logistic Regression
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006)
Binning and Plotting of Linearly Referenced Data
Short for 'linear binning', the linbin package provides functions for manipulating, binning, and plotting linearly referenced data. Although developed for data collected on river networks, it can be used with any interval or point data referenced to a 1-dimensional coordinate system. Flexible bin generation and batch processing makes it easy to compute and visualize variables at multiple scales, useful for identifying patterns within and between variables and investigating the influence of scale of observation on data interpretation.
Methods for Analyzing Binned Income Data
Methods for model selection, model averaging, and calculating metrics, such as the Gini, Theil, Mean Log Deviation, etc, on binned income data where the topmost bin is right-censored. We provide both a non-parametric method, termed the bounded midpoint estimator (BME), which assigns cases to their bin midpoints; except for the censored bins, where cases are assigned to an income estimated by fitting a Pareto distribution. Because the usual Pareto estimate can be inaccurate or undefined, especially in small samples, we implement a bounded Pareto estimate that yields much better results. We also provide a parametric approach, which fits distributions from the generalized beta (GB) family. Because some GB distributions can have poor fit or undefined estimates, we fit 10 GB-family distributions and use multimodel inference to obtain definite estimates from the best-fitting distributions. We also provide binned income data from all United States of America school districts, counties, and states.
Data Preprocessing, Binning for Classification and Regression
Various supervised and unsupervised binning tools including using entropy, recursive partition methods and clustering.
Optimal Binning of Continuous and Categorical Variables
Tool for easy and efficient discretization of continuous and categorical data. The package calculates the most optimal binning of a given explanatory variable with respect to a user-specified target variable. The purpose is to assign a unique Weight-of-Evidence value to each of the calculated binpoints in order to recode the original variable. The package allows users to impose certain restrictions on the functional form on the resulting binning while maximizing the overall information value in the original data. The package is well suited for logistic scoring models where input variables may be subject to restrictions such as linearity by e.g. regulatory authorities. An excellent source describing in detail the development of scorecards, and the role of Weight-of-Evidence coding in credit scoring is (Siddiqi 2006, ISBN: 978–0-471–75451–0). The package utilizes the discrete nature of decision trees and Isotonic Regression to accommodate the trade-off between flexible functional forms and maximum information value.
Multiple Summary Statistics for Binned Stats/Geometries
Provides the ggplot binning layer stat_summaries_hex(), which functions similar to its singular form, but allows the use of multiple statistics per bin. Those statistics can be mapped to multiple bin aesthetics.
Generate PDFs and CDFs from Binned Data
Provides several methods for generating density functions based on binned data. Methods include step function, recursive subdivision, and optimized spline. Data are assumed to be nonnegative, the top bin is assumed to have no upper bound, but the bin widths need be equal. All PDF smoothing methods maintain the areas specified by the binned data. (Equivalently, all CDF smoothing methods interpolate the points specified by the binned data.) In practice, an estimate for the mean of the distribution should be supplied as an optional argument. Doing so greatly improves the reliability of statistics computed from the smoothed density functions. Includes methods for estimating the Gini coefficient, the Theil index, percentiles, and random deviates from a smoothed distribution. Among the three methods, the optimized spline (splinebins) is recommended for most purposes. The percentile and random-draw methods should be regarded as experimental, and these methods only support splinebins.