Unicode Support and Unicode Properties
Introduction
The Motoko Regex Engine supports Unicode properties, allowing users to match specific character categories using \p{Property} and \P{Property} syntax. This enhances pattern matching by enabling character classification based on Unicode properties.
Syntax
Unicode properties can be matched using the following syntax:
\p{Property} // Matches a character with the specified Unicode property
\P{Property} // Matches a character that does NOT have the specified Unicode property
Example
\p{L} // Matches any letter
\p{N} // Matches any number
\P{P} // Matches any character except punctuation
Supported Unicode Properties
The engine supports a subset of Unicode properties:
L(Letter)Ll(Lowercase Letter)Lu(Uppercase Letter)N(Number)P(Punctuation)Zs(Separator, Space)Emoji(Emoji characters)