Build regular expressions piece by piece using human readable code. This package is designed for interactive use. For package development, use the rebus.* dependencies.
Regular expressions are a very powerful tool, but the syntax is terse enough to be difficult to read. This makes bugs easy to introduce and hard to find. This package contains functions to make building regular expressions easier.
The package contains constants for character classes (R-specific ones like
GRAPH, generic ones like
WORD, and compound ones like
ISO_DATE), special characters (
BACKSLASH), anchors (
There are functions for creating character classes, repetition, creating groups,
capturing and all the basic regex functionality (
Each of the class constants has a corresponding function that groups the class
and allows repetition (
There are operators for concatenation (
%c%) and alternation (
This reads Match a hash, followed by six hexadecimal values.
"#" %R% hex_digit(6)
To match only a hex colour and nothing else, you can add anchors to the start and end of the expression.
START %R% "#" %R% hex_digit(6) %R% END
This reads Match one or more letters, numbers, dots, underscores, percents, plusses or hyphens. Then match an 'at' symbol. Then match one or more letters, numbers, dots, or hyphens. Then match a dot. Then match two to four letters.
one_or_more(char_class(ASCII_ALNUM %R% "._%+-")) %R% "@" %R% one_or_more(char_class(ASCII_ALNUM %R% ".-")) %R% DOT %R% ascii_alpha(2, 4)
First we need an expression to match numbers between 0 and 255. Both the following syntaxes read Match two then five then a number between zero and five. Or match two then a number between zero and four then a digit. Or match an optional zero or one followed by an optional digit folowed by a compulsory digit. Make this a single token, but don't capture it.
# Using the %|% operator ip_element <- group( "25" %R% char_range(0, 5) %|% "2" %R% char_range(0, 4) %R% ascii_digit() %|% optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit() ) # The same again, this time using the or function ip_element <- or( "25" %R% char_range(0, 5), "2" %R% char_range(0, 4) %R% ascii_digit(), optional(char_class("01")) %R% optional(ascii_digit()) %R% ascii_digit() ) # It's easier to write using number_range, though it isn't quite as optimal # as handcrafted regexes. number_range(0, 255, allow_leading_zeroes = TRUE)
Now an IP address consists of 4 of these numbers separated by dots. This
reads Match a word boundary. Then create a token from an
followed by a dot, and repeat it three times. Then match another
followed by a word boundary.
BOUNDARY %R% repeated(group(ip_element %R% DOT), 3) %R% ip_element %R% BOUNDARY
debuggex.com is a visual regex debugging and testing site.
More high-level regexes for complex data types (phone numbers, post codes, car licenses, whatever).
0.1-3 Added content from rebus.base. 0.1-2 Added content from rebus.base, rebus.unicode. 0.1-1 Split into rebus.* dependencies. 0.0-5 Fixed number_range. 0.0-4 First CRAN version. 0.0-2 Initial public release.