Found 208 packages in 0.02 seconds
String Similarity Joins for Hamming and Levenshtein Distances
This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).
Radix Tree and Trie-Based String Distances
A collection of Radix Tree and Trie algorithms for finding similar sequences and calculating sequence distances (Levenshtein and other distance metrics). This work was inspired by a trie implementation in Python: "Fast and Easy Levenshtein distance using a Trie." Hanov (2011) < https://stevehanov.ca/blog/index.php?id=114>.
Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
Filtering and Randomly Sampling Real User-Agent Strings
Based on data of real user-agent strings, we can set filtering conditions and randomly sample user-agent strings from the user-agent string pool.
Patterns and Statistical Differences Between Two Groups of Strings
Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.
String Manipulation Package for Those Familiar with 'Microsoft Excel'
The goal of 'forstringr' is to enable complex string manipulation in R especially to those more familiar with LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'.
Debug R Packages
Specify debug messages as special string constants, and control debugging of packages via environment variables.
Employs String Distance Tools to Help Clean Categorical Data
Matching with string distance has never been easier! 'messy.cats' contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos, can be difficult to work with. 'messy.cats' aims to provide functions that make cleaning categorical data simple and easy.
Match and Replace Strings Based on Named Groups in Regular Expressions
An R6 class "Replacer" provided by the package simplifies working with regex patterns containing named groups. It allows easy retrieval of matched portions and targeted replacements by group name, improving both code clarity and maintainability.
Serializable Representations
String and binary representations of objects for several formats / mime types.