Understanding Regular Expression patterns
This page was meant to introduce beginners to regular patterns. A
lot of work still remains. In the meantime, you are urged to search
google.com for the examples of regular expressions and also to read up on
Microsoft's implementation at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/reconintroductiontoregularexpressions.asp
At first, a
regular expression pattern can be confusing, even intimidating. But,
with a little patience, it will all fall in place and the power of regular
expressions will be at your disposal. We first look at the basic building blocks, and then follow
up with a few patterns that result from the combination of the building
blocks. This is meant to be an introductory tutorial and doesn't even
attempt to cover advanced and sophisticated pattern matching.
A string literal
This is something straight out of the normal search routines that
everyone is familiar with. An example would be Excel's FIND
function
|
The pattern "hello" matches the hello in "hello world" and in
"Charlie says hello" It does not match anything in "Charlie says hel-lo" |
|
As shown below, there are some number of characters with specific
meaning in a regular expression. To use any of those characters
as itself, it must be 'escaped' with the reverse slash character. |
For example, ^ is a special character
indicating the start of the string. \^
means the caret character itself. Similarly, the dot
. in a pattern stands for any single character
at all. So, the correct way to specify a dot as itself in a
pattern is \. |
At the start of the string
^ indicates the start of the string. |
The pattern "^hello" matches the hello in "hello world"
but not in
"Charlie says hello" |
At the end of the string
$ indicates the end of the string |
The pattern "hello$" matches the hello in
"Charlie says hello" but not in "hello world" |
Any single character
. is the pattern for any one character.
In other Microsoft software, it has been the question mark character ? |
The pattern "he." matches hel in "hello world"
and in "Charlie says hello" but finds no match in
"he" |
A set of characters
Specify a set characters by enclosing them inside square brackets
[ and ].
Create a list of all characters between two specified characters
with the -.
|
The pattern [hlo] specifies any of the
characters h, l, or o. The pattern [a-z]
specifies any of the lower case letters between a and z, both
inclusive. Similarly, [a-zA-Z0-9]
specifies any letter between a and z, both inclusive irrespective of
case or any digit between 0 and 9, both inclusive. |
|
|
|
|
|
|
|
|
|
|
Special characters |
Repeat a pattern
There are three variants of this capability
Repeat a character zero or one times
? is the pattern to repeat the previous token zero or one time |
The pattern 9,? will match
a nine by itself or a nine followed by a comma. Similarly,
9\.? will match any occurrence of a nine
followed by a dot. Note that the dot has to be 'escaped'
because by itself it has special significance in a regular
expression. For example, 9.? would
match 9x in the string 9xyz. |
Repeat a character zero or more times
* is the pattern to repeat the previous token zero or more
times |
The pattern 9,* will match
a nine by itself or a nine followed by any number of commas
including zero. Effectively, it would match 9 or 9, or
9,, or 9,,, or...you get the idea. |
Repeat a character one or more times
+ is the pattern to repeat the previous token one or more times |
The pattern 9,+ will match
a nine followed by one or more commas. Effectively, it would
match 9, or 9,, or 9,,, or...you get the idea. However, it
would not match 9 by itself. |
|
| |
|
| |
|
|