A character class is used to represent a set of characters. The following combinations are allowed in describing a character class:
^$()%.[]*+-?
) represents the character x itself.
.
: (a dot) represents all characters.
%a
: represents all letters.
%c
: represents all control characters.
%d
: represents all digits.
%l
: represents all lowercase letters.
%p
: represents all punctuation characters.
%s
: represents all space characters.
%u
: represents all uppercase letters.
%w
: represents all alphanumeric characters.
%x
: represents all hexadecimal digits.
%z
: represents the character with representation 0.
%_x_
: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a '=%=' when used to represent itself in a pattern.
[set]
: represents the class which is the union of all characters in set. A range of characters may be specified by separating the end characters of the range with a '=-='. All classes =%=_x_ described above may also be used as components in set. All other characters in set represent themselves. For example, [%w_]
(or [_%w]
) represents all alphanumeric characters plus the underscore, [0-7]
represents the octal digits, and [0-7%l%-]
represents the octal digits plus the lowercase letters plus the '=-=' character.
The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z]
or [a-%%]
have no meaning.
[^_set_]
: represents the complement of set, where set is interpreted as above.
For all classes represented by single letters (%a
, %c
, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S
represents all non-space characters.
The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z]
may not be equivalent to %l
.