Found 211 packages in 0.01 seconds
Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Radix Tree and Trie-Based String Distances
A collection of Radix Tree and Trie algorithms for finding similar sequences and calculating sequence distances (Levenshtein and other distance metrics). This work was inspired by a trie implementation in Python: "Fast and Easy Levenshtein distance using a Trie." Hanov (2011) < https://stevehanov.ca/blog/index.php?id=114>.
String Similarity Joins for Hamming and Levenshtein Distances
This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).
Filtering and Randomly Sampling Real User-Agent Strings
Based on data of real user-agent strings, we can set filtering conditions and randomly sample user-agent strings from the user-agent string pool.
Patterns and Statistical Differences Between Two Groups of Strings
Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.
String Manipulation Package for Those Familiar with 'Microsoft Excel'
The goal of 'forstringr' is to enable complex string manipulation in R especially to those more familiar with LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'.
Join Tables Together on Inexact Matching
Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.
Debug R Packages
Specify debug messages as special string constants, and control debugging of packages via environment variables.
Employs String Distance Tools to Help Clean Categorical Data
Matching with string distance has never been easier! 'messy.cats' contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos, can be difficult to work with. 'messy.cats' aims to provide functions that make cleaning categorical data simple and easy.
Match and Replace Strings Based on Named Groups in Regular Expressions
An R6 class "Replacer" provided by the package simplifies working with regex patterns containing named groups. It allows easy retrieval of matched portions and targeted replacements by group name, improving both code clarity and maintainability.