Syntax
Supported Syntax
Motoko regex supports a variety of syntax features for defining patterns. These include:
- Character matching (
a
,b
,c
, etc.) - Alternation (
|
) - Grouping (
()
) - Character classes (
[]
with support for ranges like[a-z]
) - Quantifiers (
*
,+
,?
,{n}
,{n,m}
) - Anchors (
^
,$
)
Quantifiers
Quantifiers specify how many times a preceding element must occur for a match.
Supported Quantifiers
Quantifier | Meaning | Example |
---|---|---|
* | Match 0 or more times | a* matches "", "a", "aaa" |
+ | Match 1 or more times | a+ matches "a", "aaa" |
? | Match 0 or 1 time | a? matches "", "a" |
{n} | Match exactly n times | a{2} matches "aa" |
{n,} | Match at least n times | a{2,} matches "aa", "aaa" |
{n,m} | Match between n and m times | a{2,4} matches "aa", "aaa", "aaaa" |
Quantifier Modes
Quantifiers can operate in different modes:
- Greedy: Matches as many occurrences as possible.
- Lazy (
?
after quantifier): Matches as few as possible. E.g.,a+?
matches fewer occurrences of "a".
Invalid Quantifiers
Certain quantifier patterns are not allowed:
- Redundant modifiers, such as
a{2}+
ora{2}*
. - Empty quantifiers, e.g.,
{}
or{,}
. - Multiple commas in ranges, e.g.,
{2,,4}
.
Metacharacters
Metacharacters represent special patterns or symbols.
Metacharacter | Meaning | Example |
---|---|---|
. | Match any character except \n | a.b matches "acb" |
\w | Match word characters (alphanumeric + _ ) | \w+ matches "abc123" |
\W | Match non-word characters | \W matches "@" |
\d | Match digits (0-9 ) | \d+ matches "123" |
\D | Match non-digits | \D matches "a" |
\s | Match whitespace | \s+ matches " " |
\S | Match non-whitespace | \S matches "a" |
Character Classes
Character classes allow matching sets of characters.
[abc]
: Matches any charactera
,b
, orc
.[^abc]
: Matches any character excepta
,b
, orc
.[a-z]
: Matches any character in the rangea
toz
.
Nested Quantifiers
Quantifiers inside character classes must be explicitly defined. Nested or redundant quantifiers, like [a-z]{2}+
, are not allowed.
Anchors
Anchors specify positions in the text.
Anchor | Meaning | Example |
---|---|---|
^ | Start of the string | ^abc matches "abc" at the beginning |
$ | End of the string | abc$ matches "abc" at the end |
\b | Word boundary | \bword\b matches "word" |
\B | Non-word boundary | \Bword matches "word" not at a boundary |
Groups and Group Modifiers
Groups are enclosed in parentheses ()
and can be modified for specific behaviors.
Supported Group Modifiers
Modifier | Syntax | Meaning |
---|---|---|
Non-capturing | (?:...) | Groups without capturing |
Positive Lookahead | (?=...) | Asserts that what follows matches |
Negative Lookahead | (?!...) | Asserts that what follows does not match |
Positive Lookbehind | (?<=...) | Asserts that what precedes matches |
Negative Lookbehind | (?<!...) | Asserts that what precedes does not match |
Escaped Characters
Escape sequences represent special characters.
Escape Sequence | Meaning |
---|---|
\\ | Literal backslash |
\n | Newline |
\t | Tab |
\w , \W | Word/Non-word characters |
\d , \D | Digit/Non-digit |
\s , \S | Whitespace/Non-whitespace |
Invalid escape sequences throw an error.
Prohibited Patterns
- Invalid group modifiers: e.g.,
(?)
. - Empty groups:
()
is not allowed. - Empty character classes:
[]
results in an error. - Redundant or conflicting quantifiers:
a{2}+
.
Error Handling
The Motoko regex engine provides detailed error feedback to help developers identify and fix issues in their regular expressions. Below is a list of all possible errors, their meanings, and typical scenarios where they might occur.
Error Types
Error | Description | Cause |
---|---|---|
#UnexpectedCharacter | An invalid character was encountered during parsing. | Using a character that is not allowed in regex syntax, such as unescaped special characters. |
#UnexpectedEndOfInput | The regex input ended unexpectedly, leaving constructs incomplete. | Omitting closing brackets, parentheses, or quantifier ranges. |
#GenericError | A generic error message providing additional context. | Various syntax or logic errors not covered by specific error types. |
#InvalidQuantifierRange | A malformed or invalid quantifier range was used. | Using invalid quantifier syntax, e.g., {,} , {,3} , {a,b} . |
#InvalidEscapeSequence | An invalid escape sequence was encountered. | Using unrecognized escape sequences like \q or \x without proper syntax. |
#UnmatchedParenthesis | A closing parenthesis ) does not match any preceding opening parenthesis ( . | Missing or extra closing parentheses in the regex pattern. |
#MismatchedParenthesis | Parentheses do not form a valid pairing. | Nested parentheses are incorrectly matched or unbalanced, e.g., ((a)b]) . |
#UnexpectedToken | An unexpected token was encountered during parsing. | Using misplaced or unrecognized tokens in the regex pattern. |
#UnclosedGroup | A group construct is not properly closed with a closing parenthesis ) . | Missing a closing parenthesis in a group definition. |
#InvalidQuantifier | A quantifier is malformed or applied in an invalid context. | Using redundant or conflicting quantifiers, e.g., a{2}+ . |
#EmptyExpression | The regex input is empty or contains no valid expressions. | Providing an empty string or expression with no meaningful content. |
#NotCompiled | The regex has not been compiled before attempting to use it. | There was an error during compilation of the reject object, this may be due to any of the previous errors. That error will be specified in the #NotCompiled variant. |