Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 110 packages in 0.01 seconds

ptools — by Andrew Wheeler, 3 years ago

Tools for Poisson Data

Functions used for analyzing count data, mostly crime counts. Includes checking difference in two Poisson counts (e-test), checking the fit for a Poisson distribution, small sample tests for counts in bins, Weighted Displacement Difference test (Wheeler and Ratcliffe, 2018) , to evaluate crime changes over time in treated/control areas. Additionally includes functions for aggregating spatial data and spatial feature engineering.

commecometrics — by Maria A. Hurtado-Materon, 4 months ago

Ecometric Models of Trait–Environment Relationships at the Community Level

Provides a framework for modeling relationships between functional traits and both quantitative and qualitative environmental variables at the community level. It includes tools for trait binning, likelihood-based environmental estimation, model evaluation, fossil projection into modern ecometric space, and result visualization. For more details see Vermillion et al. (2018) , Polly et al. (2011) and Polly and Head (2015) .

esvis — by Daniel Anderson, 6 years ago

Visualization and Estimation of Effect Sizes

A variety of methods are provided to estimate and visualize distributional differences in terms of effect sizes. Particular emphasis is upon evaluating differences between two or more distributions across the entire scale, rather than at a single point (e.g., differences in means). For example, Probability-Probability (PP) plots display the difference between two or more distributions, matched by their empirical CDFs (see Ho and Reardon, 2012; ), allowing for examinations of where on the scale distributional differences are largest or smallest. The area under the PP curve (AUC) is an effect-size metric, corresponding to the probability that a randomly selected observation from the x-axis distribution will have a higher value than a randomly selected observation from the y-axis distribution. Binned effect size plots are also available, in which the distributions are split into bins (set by the user) and separate effect sizes (Cohen's d) are produced for each bin - again providing a means to evaluate the consistency (or lack thereof) of the difference between two or more distributions at different points on the scale. Evaluation of empirical CDFs is also provided, with built-in arguments for providing annotations to help evaluate distributional differences at specific points (e.g., semi-transparent shading). All function take a consistent argument structure. Calculation of specific effect sizes is also possible. The following effect sizes are estimable: (a) Cohen's d, (b) Hedges' g, (c) percentage above a cut, (d) transformed (normalized) percentage above a cut, (e) area under the PP curve, and (f) the V statistic (see Ho, 2009; ), which essentially transforms the area under the curve to standard deviation units. By default, effect sizes are calculated for all possible pairwise comparisons, but a reference group (distribution) can be specified.

BOLDconnectR — by Sameer Padhye, 5 months ago

Retrieve, Transform and Analyze the Barcode of Life Data Systems Data

Facilitates retrieval, transformation and analysis of the data from the Barcode of Life Data Systems (BOLD) database < https://boldsystems.org/>. This package allows both public and private user data to be easily downloaded into the R environment using a variety of inputs such as: IDs (processid, sampleid), BINs, dataset codes, project codes, taxonomy, geography etc. It provides frictionless data conversion into formats compatible with other R-packages and third-party tools, as well as functions for sequence alignment & clustering, biodiversity analysis and spatial mapping.

plotluck — by Stefan Schroedl, 7 years ago

'ggplot2' Version of "I'm Feeling Lucky!"

Examines the characteristics of a data frame and a formula to automatically choose the most suitable type of plot out of the following supported options: scatter, violin, box, bar, density, hexagon bin, spine plot, and heat map. The aim of the package is to let the user focus on what to plot, rather than on the "how" during exploratory data analysis. It also automates handling of observation weights, logarithmic axis scaling, reordering of factor levels, and overlaying smoothing curves and median lines. Plots are drawn using 'ggplot2'.

npsp — by Ruben Fernandez-Casal, 2 years ago

Nonparametric Spatial Statistics

Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) or Castillo-Paez et al. (2019) .

PWFSLSmoke — by Jonathan Callahan, 4 years ago

Utilities for Working with Air Quality Monitoring Data

Utilities for working with air quality monitoring data with a focus on small particulates (PM2.5) generated by wildfire smoke. Functions are provided for downloading available data from the United States 'EPA' < https://www.epa.gov/outdoor-air-quality-data> and it's 'AirNow' air quality site < https://www.airnow.gov>. Additional sources of PM2.5 data made accessible by the package include: 'AIRSIS' (aka "Oceaneering", not public) and 'WRCC' < https://wrcc.dri.edu/cgi-bin/smoke.pl>. Data compilations are hosted by the USFS 'AirFire' research team < https://www.airfire.org>.

scorecard — by Shichen Xie, 8 days ago

Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.

seq2R — by Nora M. Villanueva, a year ago

Simple Method to Detect Compositional Changes in Genomic Sequences

This software is useful for loading '.fasta' or '.gbk' files, and for retrieving sequences from 'GenBank' dataset < https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in 'seq2R'.

hilbertSimilarity — by Yann Abraham, a month ago

Hilbert Similarity Index for High Dimensional Data

Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.