Syntax


Supported Syntax

Motoko regex supports a variety of syntax features for defining patterns. These include:

  • Character matching (a, b, c, etc.)
  • Alternation (|)
  • Grouping (())
  • Character classes ([] with support for ranges like [a-z])
  • Quantifiers (*, +, ?, {n}, {n,m})
  • Anchors (^, $)

Quantifiers

Quantifiers specify how many times a preceding element must occur for a match.

Supported Quantifiers

QuantifierMeaningExample
*Match 0 or more timesa* matches "", "a", "aaa"
+Match 1 or more timesa+ matches "a", "aaa"
?Match 0 or 1 timea? matches "", "a"
{n}Match exactly n timesa{2} matches "aa"
{n,}Match at least n timesa{2,} matches "aa", "aaa"
{n,m}Match between n and m timesa{2,4} matches "aa", "aaa", "aaaa"

Quantifier Modes

Quantifiers can operate in different modes:

  • Greedy: Matches as many occurrences as possible.
  • Lazy (? after quantifier): Matches as few as possible. E.g., a+? matches fewer occurrences of "a".

Invalid Quantifiers

Certain quantifier patterns are not allowed:

  • Redundant modifiers, such as a{2}+ or a{2}*.
  • Empty quantifiers, e.g., {} or {,}.
  • Multiple commas in ranges, e.g., {2,,4}.

Metacharacters

Metacharacters represent special patterns or symbols.

MetacharacterMeaningExample
.Match any character except \na.b matches "acb"
\wMatch word characters (alphanumeric + _)\w+ matches "abc123"
\WMatch non-word characters\W matches "@"
\dMatch digits (0-9)\d+ matches "123"
\DMatch non-digits\D matches "a"
\sMatch whitespace\s+ matches " "
\SMatch non-whitespace\S matches "a"

Character Classes

Character classes allow matching sets of characters.

  • [abc]: Matches any character a, b, or c.
  • [^abc]: Matches any character except a, b, or c.
  • [a-z]: Matches any character in the range a to z.

Nested Quantifiers

Quantifiers inside character classes must be explicitly defined. Nested or redundant quantifiers, like [a-z]{2}+, are not allowed.


Anchors

Anchors specify positions in the text.

AnchorMeaningExample
^Start of the string^abc matches "abc" at the beginning
$End of the stringabc$ matches "abc" at the end
\bWord boundary\bword\b matches "word"
\BNon-word boundary\Bword matches "word" not at a boundary

Groups and Group Modifiers

Groups are enclosed in parentheses () and can be modified for specific behaviors.

Supported Group Modifiers

ModifierSyntaxMeaning
Non-capturing(?:...)Groups without capturing
Positive Lookahead(?=...)Asserts that what follows matches
Negative Lookahead(?!...)Asserts that what follows does not match
Positive Lookbehind(?<=...)Asserts that what precedes matches
Negative Lookbehind(?<!...)Asserts that what precedes does not match

Escaped Characters

Escape sequences represent special characters.

Escape SequenceMeaning
\\Literal backslash
\nNewline
\tTab
\w, \WWord/Non-word characters
\d, \DDigit/Non-digit
\s, \SWhitespace/Non-whitespace

Invalid escape sequences throw an error.


Prohibited Patterns

  • Invalid group modifiers: e.g., (?).
  • Empty groups: () is not allowed.
  • Empty character classes: [] results in an error.
  • Redundant or conflicting quantifiers: a{2}+.

Error Handling

The Motoko regex engine provides detailed error feedback to help developers identify and fix issues in their regular expressions. Below is a list of all possible errors, their meanings, and typical scenarios where they might occur.

Error Types

ErrorDescriptionCause
#UnexpectedCharacterAn invalid character was encountered during parsing.Using a character that is not allowed in regex syntax, such as unescaped special characters.
#UnexpectedEndOfInputThe regex input ended unexpectedly, leaving constructs incomplete.Omitting closing brackets, parentheses, or quantifier ranges.
#GenericErrorA generic error message providing additional context.Various syntax or logic errors not covered by specific error types.
#InvalidQuantifierRangeA malformed or invalid quantifier range was used.Using invalid quantifier syntax, e.g., {,}, {,3}, {a,b}.
#InvalidEscapeSequenceAn invalid escape sequence was encountered.Using unrecognized escape sequences like \q or \x without proper syntax.
#UnmatchedParenthesisA closing parenthesis ) does not match any preceding opening parenthesis (.Missing or extra closing parentheses in the regex pattern.
#MismatchedParenthesisParentheses do not form a valid pairing.Nested parentheses are incorrectly matched or unbalanced, e.g., ((a)b]).
#UnexpectedTokenAn unexpected token was encountered during parsing.Using misplaced or unrecognized tokens in the regex pattern.
#UnclosedGroupA group construct is not properly closed with a closing parenthesis ).Missing a closing parenthesis in a group definition.
#InvalidQuantifierA quantifier is malformed or applied in an invalid context.Using redundant or conflicting quantifiers, e.g., a{2}+.
#EmptyExpressionThe regex input is empty or contains no valid expressions.Providing an empty string or expression with no meaningful content.
#NotCompiledThe regex has not been compiled before attempting to use it.There was an error during compilation of the reject object, this may be due to any of the previous errors. That error will be specified in the #NotCompiled variant.