METACRAN search results

RPatternJoin — by Daniil Matveev, 8 months ago

String Similarity Joins for Hamming and Levenshtein Distances

This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).

seqtrie — by Travers Ching, 4 months ago

Radix Tree and Trie-Based String Distances

A collection of Radix Tree and Trie algorithms for finding similar sequences and calculating sequence distances (Levenshtein and other distance metrics). This work was inspired by a trie implementation in Python: "Fast and Easy Levenshtein distance using a Trie." Hanov (2011) < https://stevehanov.ca/blog/index.php?id=114>.

https://github.com/traversc/seqtrie

fuzzyjoin — by David Robinson, 5 years ago

Join Tables Together on Inexact Matching

Join tables together based not on whether columns match exactly, but whether they are similar by some comparison. Implementations include string distance and regular expression matching.

https://github.com/dgrtwo/fuzzyjoin

Randomuseragent — by Fangzhou Xie, 4 years ago

Filtering and Randomly Sampling Real User-Agent Strings

Based on data of real user-agent strings, we can set filtering conditions and randomly sample user-agent strings from the user-agent string pool.

https://github.com/fangzhou-xie/Randomuseragent, https://fangzhou-xie.github.io/Randomuseragent/index.html

GrpString — by Hui (Tom) Tang, 8 years ago

Patterns and Statistical Differences Between Two Groups of Strings

Methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.

forstringr — by Ezekiel Ogundepo, 2 years ago

String Manipulation Package for Those Familiar with 'Microsoft Excel'

The goal of 'forstringr' is to enable complex string manipulation in R especially to those more familiar with LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'.

https://github.com/gbganalyst/forstringr

debugme — by Gábor Csárdi, a year ago

Debug R Packages

Specify debug messages as special string constants, and control debugging of packages via environment variables.

https://github.com/r-lib/debugme#readme, https://r-lib.github.io/debugme/

messy.cats — by Harrison Karp, 3 years ago

Employs String Distance Tools to Help Clean Categorical Data

Matching with string distance has never been easier! 'messy.cats' contains various functions that employ string distance tools in order to make data management easier for users working with categorical data. Categorical data, especially user inputted categorical data that often tends to be plagued by typos, can be difficult to work with. 'messy.cats' aims to provide functions that make cleaning categorical data simple and easy.

regreplaceR — by Gwang-Jin Kim, 10 months ago

Match and Replace Strings Based on Named Groups in Regular Expressions

An R6 class "Replacer" provided by the package simplifies working with regex patterns containing named groups. It allows easy retrieval of matched portions and targeted replacements by group name, improving both code clarity and maintainability.

https://github.com/gwangjinkim/regreplaceR

repr — by Philipp Angerer, a year ago

Serializable Representations

String and binary representations of objects for several formats / mime types.

https://github.com/IRkernel/repr/

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug