Regular Expressions
Regular expressions are an advanced filter function of Lydia Email Agent.
A regular expression is a pattern of characters used to match against a string of text. The regular expressions available in Lydia Email Agent are based on those found in PERL.
See http://www.perldoc.com/perl5.8.0/pod/perlre.html for more information on PERL and regular expressions.
Basic Syntax
Case Sensitivity Modifier
Expressions are by default case sensitive. Adding "(?i)" to the beginning of an expression will enable case-insensitive pattern matching.
Examples:
"ba*" will match "bat" but not "Bat".
"(?i)ba*" will match both "bat" and "Bat".
Literals
All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by a "\". A literal is a character that matches itself.
Wildcard
The dot character "." matches any single character.
Repeats
A repeat is an expression that is repeated an arbitrary number of times.
An expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times, but at least once.
An expression followed by "?" may be repeated zero or one times only.
When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used. Thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. These are all called repetition intervals. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds.
All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Examples:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". The matching events contain information both on what the whole expression matched and on what each sub-expression matched. It is permissible for sub-expressions to match null strings. If a sub-expression takes no part in a match - for example if it is part of an alternative that is not taken - then the string that is returned for that sub-expression is empty.
Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don't want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:
"(?:abc)*"
Alternatives
Alternatives occur when the expression can match either one sub-expression or another.
Each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.
Examples:
"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.
Examples:
Character literals:
"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:
"[a-z]" will match any character in the range "a" to "z".
"[^A-Z]" will match any character other than those in the range "A" to "Z".
Note that character ranges are highly locale dependent: they match any character that collates between the endpoints of the range, and ranges will only behave according to ASCII rules when the default "C" locale is in effect. You can set a different locale by setting the public property Locale.
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. Character classes are only available if the ExpressionOptions property includes CharacterClasses. The available character classes are:
alnum Any alpha numeric character.
alpha Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.
blank Any blank character, either a space or a tab.
cntrl Any control character.
digit Any digit 0-9.
graph Any graphical character.
lower Any lower case character a-z. Other characters may also be included depending upon the locale.
print Any printable character.
punct Any punctuation character.
space Any whitespace character.
upper Any upper case character A-Z. Other characters may also be included depending upon the locale.
xdigit Any hexadecimal digit character, 0-9, a-f and A-F.
word Any word character - all alphanumeric characters plus the underscore.
unicode Any character whose code is greater than 255, this applies to the wide character traits classes only.
There are some shortcuts that can be used in place of the character classes.:
\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
|