Unicode Support and Unicode Properties
Introduction
The Motoko Regex Engine supports Unicode properties, allowing users to match specific character categories using \p{Property}
and \P{Property}
syntax. This enhances pattern matching by enabling character classification based on Unicode properties.
Syntax
Unicode properties can be matched using the following syntax:
\p{Property} // Matches a character with the specified Unicode property
\P{Property} // Matches a character that does NOT have the specified Unicode property
Example
\p{L} // Matches any letter
\p{N} // Matches any number
\P{P} // Matches any character except punctuation
Supported Unicode Properties
The engine supports a subset of Unicode properties:
L
(Letter)Ll
(Lowercase Letter)Lu
(Uppercase Letter)N
(Number)P
(Punctuation)Zs
(Separator, Space)Emoji
(Emoji characters)