Found 237 packages in 0.02 seconds
Fast Fuzzy String Joins for Data Frames
Perform fuzzy joins on data frames using approximate string matching. Implements inner, left, right, full, semi, and anti joins with string distance metrics from the 'stringdist' package, including Optimal String Alignment, Levenshtein, Damerau-Levenshtein, Jaro-Winkler, q-gram, cosine, Jaccard, and Soundex. Uses a 'data.table' backend plus compiled 'C++' result assembly to reduce overhead in large joins, while adaptive candidate planning avoids unnecessary distance evaluations in single-column string joins. Suitable for reconciling misspellings, inconsistent labels, and other near-match identifiers while optionally returning the computed distance for each match.
String Distance Calculation with Tidy Data Principles
Calculation of string distance following the tidy data principles. Built on top of the 'stringdist' package.
Human-Friendly Text from Unknown Strings
Simple functions for joining strings. Construct human-friendly messages whose elements aren't known in advance, like in stop, warning, or message, from clean code.
Representing Graphs as 'graph6', 'digraph6' or 'sparse6' Strings
Encode network data as strings of printable ASCII characters. Implemented
functions include encoding and decoding adjacency matrices, edgelists, igraph, and
network objects to/from formats 'graph6', 'sparse6', and 'digraph6'. The formats and
methods are described in McKay, B.D. and Piperno, A (2014)
Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Radix Tree and Trie-Based String Distances
A collection of Radix Tree and Trie algorithms for finding similar sequences and calculating sequence distances (Levenshtein and other distance metrics). This work was inspired by a trie implementation in Python: "Fast and Easy Levenshtein distance using a Trie." Hanov (2011) < https://stevehanov.ca/blog/index.php?id=114>.
String Similarity Joins for Hamming and Levenshtein Distances
This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).
Filtering and Randomly Sampling Real User-Agent Strings
Based on data of real user-agent strings, we can set filtering conditions and randomly sample user-agent strings from the user-agent string pool.
String Manipulation Package for Those Familiar with 'Microsoft Excel'
The goal of 'forstringr' is to enable complex string manipulation in R especially to those more familiar with LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'.
ASCII Formatting for Values and Tables
We provide a framework for rendering complex tables to ASCII, and a set of formatters for transforming values or sets of values into ASCII-ready display strings.