METACRAN search results

fuzzystring — by Paul E. Santos Andrade, a month ago

Fast Fuzzy String Joins for Data Frames

Perform fuzzy joins on data frames using approximate string matching. Implements inner, left, right, full, semi, and anti joins with string distance metrics from the 'stringdist' package, including Optimal String Alignment, Levenshtein, Damerau-Levenshtein, Jaro-Winkler, q-gram, cosine, Jaccard, and Soundex. Uses a 'data.table' backend plus compiled 'C++' result assembly to reduce overhead in large joins, while adaptive candidate planning avoids unnecessary distance evaluations in single-column string joins. Suitable for reconciling misspellings, inconsistent labels, and other near-match identifiers while optionally returning the computed distance for each match.

https://github.com/PaulESantos/fuzzystring, https://paulesantos.github.io/fuzzystring/

tidystringdist — by Colin Fay, 4 days ago

String Distance Calculation with Tidy Data Principles

Calculation of string distance following the tidy data principles. Built on top of the 'stringdist' package.

concatenate — by James Dunham, 10 years ago

Human-Friendly Text from Unknown Strings

Simple functions for joining strings. Construct human-friendly messages whose elements aren't known in advance, like in stop, warning, or message, from clean code.

https://github.com/jamesdunham/concatenate

rgraph6 — by Michal Bojanowski, a year ago

Representing Graphs as 'graph6', 'digraph6' or 'sparse6' Strings

Encode network data as strings of printable ASCII characters. Implemented functions include encoding and decoding adjacency matrices, edgelists, igraph, and network objects to/from formats 'graph6', 'sparse6', and 'digraph6'. The formats and methods are described in McKay, B.D. and Piperno, A (2014) .

https://mbojan.github.io/rgraph6/

stringx — by Marek Gagolewski, a year ago

Replacements for Base String Functions Powered by 'stringi'

English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.

https://stringx.gagolewski.com/, https://github.com/gagolews/stringx

seqtrie — by Travers Ching, 7 months ago

Radix Tree and Trie-Based String Distances

A collection of Radix Tree and Trie algorithms for finding similar sequences and calculating sequence distances (Levenshtein and other distance metrics). This work was inspired by a trie implementation in Python: "Fast and Easy Levenshtein distance using a Trie." Hanov (2011) < https://stevehanov.ca/blog/index.php?id=114>.

https://github.com/traversc/seqtrie

RPatternJoin — by Daniil Matveev, 2 years ago

String Similarity Joins for Hamming and Levenshtein Distances

This project is a tool for words edit similarity joins (a.k.a. all-pairs similarity search) under small (< 3) edit distance constraints. It works for Levenshtein/Hamming distances and words from any alphabet. The software was originally developed for joining amino-acid/nucleotide sequences from Adaptive Immune Repertoires, where the number of words is relatively large (10^5-10^6) and the average length of words is relatively small (10-100).

Randomuseragent — by Fangzhou Xie, 5 years ago

Filtering and Randomly Sampling Real User-Agent Strings

Based on data of real user-agent strings, we can set filtering conditions and randomly sample user-agent strings from the user-agent string pool.

https://github.com/fangzhou-xie/Randomuseragent, https://fangzhou-xie.github.io/Randomuseragent/index.html

forstringr — by Ezekiel Ogundepo, 3 years ago

String Manipulation Package for Those Familiar with 'Microsoft Excel'

The goal of 'forstringr' is to enable complex string manipulation in R especially to those more familiar with LEFT(), RIGHT(), and MID() functions in Microsoft Excel. The package combines the power of 'stringr' with other manipulation packages such as 'dplyr' and 'tidyr'.

https://github.com/gbganalyst/forstringr

formatters — by Joe Zhu, 5 months ago

ASCII Formatting for Values and Tables

We provide a framework for rendering complex tables to ASCII, and a set of formatters for transforming values or sets of values into ASCII-ready display strings.

https://insightsengineering.github.io/formatters/, https://github.com/insightsengineering/formatters/

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug