RegExp:
Regular Expressions are a powerful tool used in pattern-matching
and substitution. They are commonly associated with almost all
UNIX-based tools, including editors like vi, scripting languages
like Perl and PHP, and Shell programs like awk and sed.
And now they finally also exist in JavaScript.
A Regular Expression lets you build patterns using a set of special
characters. Depending on whether or not there's a match, appropriate
action can be taken, and appropriate program code executed.
For example, Form Validation is one of the most common requirements.
You don't know what exact values the user will enter, but you do know
the format they need to use. A Regular Expression is a way of
representing a pattern you are looking for in a string.
RegExp
Syntax
var myRegExp = /pattern/[switch]
In a Regular Expression /pattern/ is a Regular Expression and [switch]
(optional) indicates the mode in which the Regular Expression is to be
used:
"i" - ignore case,
"g" - global search,
"gi" - global search + ignore case
(case-insensitive).
After a Regular Expression is created, it is passed to a Method of a
String Object.
RegExp
Quantifiers - "meta-characters"
How about something a little more complex? Well, we can use "meta-characters",
special characters, that have a special meaning when used within a
pattern.
"+" is used to match one
or more occurrence of the preceding character. So,
/bo+/
would match the words "bore", "boom", and
"bookstore".
"*" is used to match zero
or more occurrences of the preceding character. So,
/mat*/
would match "ma", "mat" and "matter".
"?" are used to match zero
or one occurrence of the preceding character. So,
/smit?/
would match "smirk", "smile" "smith" and
"smitten", though not "smear" or "smelt".
"{x, y}" is used match
a range. So,
/mo{2,6}/
would match "smooth" and "smooooooth!", but not
"moth". The numbers in the curly braces represent the lower
and upper values of the range to match. NOTE: you can leave out
the upper limit for an open-ended range match.
RegExp
Special Characters
It's also possible to search for whitespace, numbers and alphabetic
characters with a Regular Expression. The following lists these special
characters:
\s = used to match a single whitespace character, including tabs
and newline characters
\S = used to match everything that is not a whitespace
character
\d = used to match numbers from 0 to 9
\w = used to match letters, numbers and underscores
\W = used to match anything that does not match with \w
. = used to match everything except the newline character
OK, the famous question -- how do I use them?!". Well,
suppose you wanted to find all the whitespace in a document...
/\s+/
That wasn't hard, right? What if you're looking only for numbers, you
might try
/\d/
How about limiting your search to the beginning or end of a string?
Well, that's why we have "pattern anchors" -- these
simply tie your Regular Expression to either the first or last character
of the string, and come in very useful when you're looking for a way to
filter through a mass of matches.
-- Pattern Anchors (^, $) --
[^] caret
is used to indicate that the expression should be matched only at the
beginning of the string that it is applied to. So,
/^script/
will return a match only if it finds a word beginning with
"script" -- "scripting" and
"scripts", but not "javascript".
"$" anchor
is used to match the end of a string. So,
/ar$/
would match "scar", "car" and "bar", but
not "art", "army" or "arrow".
There's also a simpler way to add pattern anchors to your expression --
the \b. This is used to check that the RegExp matches the
boundary of a string, and it can be placed either at the beginning or
end of the pattern to be matched. So,
/\bhom/
would match both "home" and "homestead", while
/man\b/
would match "human", "woman" and "man",
though not "manor" or "manners". And the converse of
this is \B, which matches everywhere but at the boundaries of a string.
Examples of RegExp Special Characters:
- "Charles the Brit raced his moped through the park."
- "The Park Ranger watched Charles do this."
| var reg1 = /^Charles/; |
// "Charles"
on line 1 but not line 2 |
| var reg2 = /his$/; |
// "this" on
line 2 but not "his" on line 1 |
| var reg3 = /\bt/; |
// " the"
and " through" but not "Brit" or
"watched" |
| var reg4 = /\Bt/; |
// "Brit" or
"watched" but not " the" or "
through" |
| var reg5 = /t\s./; |
// "Brit
raced" but not "watched" |
| var reg6 = /t\S./; |
// "watched"
but not "Brit raced" |
| var reg7 = /th./; |
//
"through", "the", and "this" |
RegExp
"Group Matching"
Just as you can specify a range for the number of characters to be
matched, you can also specify a range of characters. For example, the
range
/[A-Z]/
would match a single instance of all upper-case alphabetic characters,
while
/[a-z]/
would match all lowercase letters, and
/[0-9]/
would match all numbers between 0 and 9.
Using these three ranges, it's pretty easy to create a Regular
Expression to match an alphanumeric field.
/([a-z][A-Z][0-9])+/
would match a string that was purely alphanumeric in nature, like
"aB0" -- although not "abc". NOTE the
parentheses around the patterns, they come in handy when grouping
sections of a Regular Expression together.
Choice is very important when building Regular Expressions --
as in most other languages, it's possible to use the pipe [|] operator
to indicate multiple options in a RegExp. For example,
/dos|two|zwei/
would match any one of the three strings "dos", "two"
and "zwei". This obviously comes useful when building
expressions that have many possible variants.
You can also invert the regular sense of a Regular Expression with
the negation operator, represented by ^.
So,
/[^A-C]/
would match everything but that which appears in the expression --
namely, everything except the letters "A",
"B" and "C".
NOTE: when ^ is used in a
bracketed expression it is used to invert the match. When ^
it is used outside a bracketed expression it serves as a pattern anchor.
And finally, one important thing to remember when you add any of the
meta-characters described above to your pattern and explicitly match
them, you need to "escape" then with a back slash [\].
So, the pattern
/Th\*/
would match "Th*" but not "The" -- the \*
ensures that the asterisk is matched as a literal character, not a
meta-character.
Examples of RegExp "Group Matching":
| var reg1 = /[lmw]ink/; |
// matches
"link", "mink" and "wink" |
| var reg2 = /[^lmw]ink/; |
// matches
"dink", "fink", "pink", etc..., but
not "link", "mink" or "wink" |
| var reg3 = /[a-s]ink/; |
// matches
"link", "mink", etc ..., but not
"wink" |
| var reg4 = /[^t-z]ink/; |
// matches
"link", "mink", etc ..., but not
"wink" |
var reg1 = /^([1-9]|1[0-2]):[0-5]\d$/; |
// matches proper time
values |
| var reg2 = /['"]\d\d\d['"]/; |
// matches a three
digit number in quotes |
|