Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 108 packages in 0.07 seconds

plotluck — by Stefan Schroedl, 6 years ago

'ggplot2' Version of "I'm Feeling Lucky!"

Examines the characteristics of a data frame and a formula to automatically choose the most suitable type of plot out of the following supported options: scatter, violin, box, bar, density, hexagon bin, spine plot, and heat map. The aim of the package is to let the user focus on what to plot, rather than on the "how" during exploratory data analysis. It also automates handling of observation weights, logarithmic axis scaling, reordering of factor levels, and overlaying smoothing curves and median lines. Plots are drawn using 'ggplot2'.

seq2R — by Nora M. Villanueva, a year ago

Simple Method to Detect Compositional Changes in Genomic Sequences

This software is useful for loading '.fasta' or '.gbk' files, and for retrieving sequences from 'GenBank' dataset < https://www.ncbi.nlm.nih.gov/genbank/>. This package allows to detect differences or asymmetries based on nucleotide composition by using local linear kernel smoothers. Also, it is possible to draw inference about critical points (i. e. maximum or minimum points) related with the derivative curves. Additionally, bootstrap methods have been used for estimating confidence intervals and speed computational techniques (binning techniques) have been implemented in 'seq2R'.

PWFSLSmoke — by Jonathan Callahan, 4 years ago

Utilities for Working with Air Quality Monitoring Data

Utilities for working with air quality monitoring data with a focus on small particulates (PM2.5) generated by wildfire smoke. Functions are provided for downloading available data from the United States 'EPA' < https://www.epa.gov/outdoor-air-quality-data> and it's 'AirNow' air quality site < https://www.airnow.gov>. Additional sources of PM2.5 data made accessible by the package include: 'AIRSIS' (aka "Oceaneering", not public) and 'WRCC' < https://wrcc.dri.edu/cgi-bin/smoke.pl>. Data compilations are hosted by the USFS 'AirFire' research team < https://www.airfire.org>.

BOLDconnectR — by Sameer Padhye, 3 months ago

Retrieve, Transform and Analyze the Barcode of Life Data Systems Data

Facilitates retrieval, transformation and analysis of the data from the Barcode of Life Data Systems (BOLD) database < https://boldsystems.org/>. This package allows both public and private user data to be easily downloaded into the R environment using a variety of inputs such as: IDs (processid, sampleid), BINs, dataset codes, project codes, taxonomy, geography etc. It provides frictionless data conversion into formats compatible with other R-packages and third-party tools, as well as functions for sequence alignment & clustering, biodiversity analysis and spatial mapping.

npsp — by Ruben Fernandez-Casal, 2 years ago

Nonparametric Spatial Statistics

Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) or Castillo-Paez et al. (2019) .

scorecard — by Shichen Xie, 4 months ago

Credit Risk Scorecard

The `scorecard` package makes the development of credit risk scorecard easier and efficient by providing functions for some common tasks, such as data partition, variable selection, woe binning, scorecard scaling, performance evaluation and report generation. These functions can also used in the development of machine learning models. The references including: 1. Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS. 2. Siddiqi, N. (2006, ISBN: 9780471754510). Credit risk scorecards. Developing and Implementing Intelligent Credit Scoring.

hilbertSimilarity — by Yann Abraham, 6 years ago

Hilbert Similarity Index for High Dimensional Data

Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.

Ruido — by Arthur Igor da Fonseca-Freire, 5 days ago

Soundscape Background Noise, Power, and Saturation

Accessible and flexible implementation of three ecoacoustic indices that are less commonly available in existing R frameworks: Background Noise, Soundscape Power and Soundscape Saturation. The functions were design to accommodate a variety of sampling designs. Users can tailor calculations by specifying spectrogram time bin size, amplitude thresholds and normality tests. By simplifying computation and standardizing reproducible methods, the package aims to support ecoacoustics studies. For more details about the indices read Towsey (2014) and Burivalova (2017) .

GLDEX — by Steve Su, 5 months ago

Fitting Single and Mixture of Generalised Lambda Distributions

The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution. References include the following: Karvanen and Nuutinen (2008) "Characterizing the generalized lambda distribution by L-moments" , King and MacGillivray (1999) "A starship method for fitting the generalised lambda distributions" , Su (2005) "A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data" , Su (2007) "Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions" , Su (2007) "Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R" , Su (2009) "Confidence Intervals for Quantiles Using Generalized Lambda Distributions" , Su (2010) "Chapter 14: Fitting GLDs and Mixture of GLDs to Data using Quantile Matching Method" , Su (2010) "Chapter 15: Fitting GLD to data using GLDEX 1.0.4 in R" , Su (2015) "Flexible Parametric Quantile Regression Model" , Su (2021) "Flexible parametric accelerated failure time model".

SlideCNA — by Diane Zhang, a year ago

Calls Copy Number Alterations from Slide-Seq Data

This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about 'SlideCNA' is included in the the pre-print by Zhang et al. (2022, ). The package 'enrichR' (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at < https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").