Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 103 packages in 0.03 seconds

scorecard — by Shichen Xie, a year ago

Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.

npsp — by Ruben Fernandez-Casal, a year ago

Nonparametric Spatial Statistics

Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) or Castillo-Paez et al. (2019) .

plotluck — by Stefan Schroedl, 6 years ago

'ggplot2' Version of "I'm Feeling Lucky!"

Examines the characteristics of a data frame and a formula to automatically choose the most suitable type of plot out of the following supported options: scatter, violin, box, bar, density, hexagon bin, spine plot, and heat map. The aim of the package is to let the user focus on what to plot, rather than on the "how" during exploratory data analysis. It also automates handling of observation weights, logarithmic axis scaling, reordering of factor levels, and overlaying smoothing curves and median lines. Plots are drawn using 'ggplot2'.

seq2R — by Nora M. Villanueva, 9 months ago

Simple Method to Detect Compositional Changes in Genomic Sequences

This software is useful for loading '.fasta' or '.gbk' files, and for retrieving sequences from 'GenBank' dataset < https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in 'seq2R'.

hilbertSimilarity — by Yann Abraham, 6 years ago

Hilbert Similarity Index for High Dimensional Data

Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.

GLDEX — by Steve Su, 2 years ago

Fitting Single and Mixture of Generalised Lambda Distributions

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution. References include the following: Karvanen and Nuutinen (2008) "Characterizing the generalized lambda distribution by L-moments" , King and MacGillivray (1999) "A starship method for fitting the generalised lambda distributions" , Su (2005) "A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data" , Su (2007) "Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions" , Su (2007) "Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R" , Su (2009) "Confidence Intervals for Quantiles Using Generalized Lambda Distributions" , Su (2010) "Chapter 14: Fitting GLDs and Mixture of GLDs to Data using Quantile Matching Method" , Su (2010) "Chapter 15: Fitting GLD to data using GLDEX 1.0.4 in R" , Su (2015) "Flexible Parametric Quantile Regression Model" , Su (2021) "Flexible parametric accelerated failure time model".

SlideCNA — by Diane Zhang, 5 months ago

Calls Copy Number Alterations from Slide-Seq Data

This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about 'SlideCNA' is included in the the pre-print by Zhang et al. (2022, ). The package 'enrichR' (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at < https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").

tempted — by Pixu Shi, a year ago

Temporal Tensor Decomposition, a Dimensionality Reduction Tool for Longitudinal Multivariate Data

TEMPoral TEnsor Decomposition (TEMPTED), is a dimension reduction method for multivariate longitudinal data with varying temporal sampling. It formats the data into a temporal tensor and decomposes it into a summation of low-dimensional components, each consisting of a subject loading vector, a feature loading vector, and a continuous temporal loading function. These loadings provide a low-dimensional representation of subjects or samples and can be used to identify features associated with clusters of subjects or samples. TEMPTED provides the flexibility of allowing subjects to have different temporal sampling, so time points do not need to be binned, and missing time points do not need to be imputed.

ggsurveillance — by Alexander Bartel, 2 months ago

Tools for Outbreak Investigation/Infectious Disease Surveillance

Create epicurves or epigantt charts in 'ggplot2'. Prepare data for visualisation or other reporting for infectious disease surveillance and outbreak investigation. Includes tidy functions to solve date based transformations for common reporting tasks, like (A) seasonal date alignment for respiratory disease surveillance, (B) date-based case binning based on specified time intervals like isoweek, epiweek, month and more, (C) automated detection and marking of the new year based on the date/datetime axis of the 'ggplot2'. An introduction on how to use epicurves can be found on the US CDC website (2012, < https://www.cdc.gov/training/quicklearns/epimode/index.html>).

forceplate — by Raphael Hartmann, 3 months ago

Processing Force-Plate Data

Process raw force-plate data (txt-files) by segmenting them into trials and, if needed, calculating (user-defined) descriptive statistics of variables for user-defined time bins (relative to trigger onsets) for each trial. When segmenting the data a baseline correction, a filter, and a data imputation can be applied if needed. Experimental data can also be processed and combined with the segmented force-plate data. This procedure is suggested by Johannsen et al. (2023) and some of the options (e.g., choice of low-pass filter) are also suggested by Winter (2009) .