Unicode Support and Unicode Properties

Introduction

The Motoko Regex Engine supports Unicode properties, allowing users to match specific character categories using \p{Property} and \P{Property} syntax. This enhances pattern matching by enabling character classification based on Unicode properties.

Syntax

Unicode properties can be matched using the following syntax:

\p{Property}   // Matches a character with the specified Unicode property
\P{Property}   // Matches a character that does NOT have the specified Unicode property

Example

\p{L}   // Matches any letter
\p{N}   // Matches any number
\P{P}   // Matches any character except punctuation

Supported Unicode Properties

The engine supports a subset of Unicode properties:

  • L (Letter)
  • Ll (Lowercase Letter)
  • Lu (Uppercase Letter)
  • N (Number)
  • P (Punctuation)
  • Zs (Separator, Space)
  • Emoji (Emoji characters)