JavaScript Regular Expression

RegExp:

Regular Expressions are a powerful tool used in pattern-matching and substitution. They are commonly associated with almost all UNIX-based tools, including editors like vi, scripting languages like Perl and PHP, and Shell programs like awk and sed. And now they finally also exist in JavaScript.

A Regular Expression lets you build patterns using a set of special characters. Depending on whether or not there's a match, appropriate action can be taken, and appropriate program code executed.

For example, Form Validation is one of the most common requirements. You don't know what exact values the user will enter, but you do know the format they need to use. A Regular Expression is a way of representing a pattern you are looking for in a string.

RegExp Syntax

var myRegExp = /pattern/[switch]

In a Regular Expression /pattern/ is a Regular Expression and [switch] (optional) indicates the mode in which the Regular Expression is to be used:

"i" - ignore case,
"g" - global search,
"gi" - global search + ignore case (case-insensitive).

After a Regular Expression is created, it is passed to a Method of a String Object.

RegExp Quantifiers - "meta-characters"

How about something a little more complex? Well, we can use "meta-characters", special characters, that have a special meaning when used within a pattern.

"+" is used to match one or more occurrence of the preceding character. So,

/bo+/

would match the words "bore", "boom", and "bookstore".

"*" is used to match zero or more occurrences of the preceding character. So,

/mat*/

would match "ma", "mat" and "matter".

"?" are used to match zero or one occurrence of the preceding character. So,

/smit?/

would match "smirk", "smile" "smith" and "smitten", though not "smear" or "smelt".

"{x, y}" is used match a range. So,

/mo{2,6}/

would match "smooth" and "smooooooth!", but not "moth". The numbers in the curly braces represent the lower and upper values of the range to match. NOTE: you can leave out the upper limit for an open-ended range match.

RegExp Special Characters

It's also possible to search for whitespace, numbers and alphabetic characters with a Regular Expression. The following lists these special characters:

\s = used to match a single whitespace character, including tabs and newline characters
\S = used to match everything that is not a whitespace character
\d = used to match numbers from 0 to 9
\w = used to match letters, numbers and underscores
\W = used to match anything that does not match with \w
. = used to match everything except the newline character

OK, the famous question -- how do I use them?!". Well, suppose you wanted to find all the whitespace in a document...

/\s+/

That wasn't hard, right? What if you're looking only for numbers, you might try

/\d/

How about limiting your search to the beginning or end of a string? Well, that's why we have "pattern anchors" -- these simply tie your Regular Expression to either the first or last character of the string, and come in very useful when you're looking for a way to filter through a mass of matches.

-- Pattern Anchors (^, $) --

[^] caret is used to indicate that the expression should be matched only at the beginning of the string that it is applied to. So,

/^script/

will return a match only if it finds a word beginning with "script" -- "scripting" and "scripts", but not "javascript".

"$" anchor is used to match the end of a string. So,

/ar$/

would match "scar", "car" and "bar", but not "art", "army" or "arrow".

There's also a simpler way to add pattern anchors to your expression -- the \b. This is used to check that the RegExp matches the boundary of a string, and it can be placed either at the beginning or end of the pattern to be matched. So,

/\bhom/

would match both "home" and "homestead", while

/man\b/

would match "human", "woman" and "man", though not "manor" or "manners". And the converse of this is \B, which matches everywhere but at the boundaries of a string.

Examples of RegExp Special Characters:

  1. "Charles the Brit raced his moped through the park."
  2. "The Park Ranger watched Charles do this."
var reg1 = /^Charles/;    // "Charles" on line 1 but not line 2
var reg2 = /his$/;    // "this" on line 2 but not "his" on line 1
var reg3 = /\bt/;    // " the" and " through" but not "Brit" or  "watched"
var reg4 = /\Bt/;    // "Brit" or "watched" but not " the" or " through"
var reg5 = /t\s./;    // "Brit raced" but not "watched"
var reg6 = /t\S./;    // "watched" but not "Brit raced"
var reg7 = /th./;    // "through", "the", and "this"

RegExp "Group Matching"

Just as you can specify a range for the number of characters to be matched, you can also specify a range of characters. For example, the range

/[A-Z]/

would match a single instance of all upper-case alphabetic characters, while

/[a-z]/

would match all lowercase letters, and

/[0-9]/

would match all numbers between 0 and 9.

Using these three ranges, it's pretty easy to create a Regular Expression to match an alphanumeric field.

/([a-z][A-Z][0-9])+/

would match a string that was purely alphanumeric in nature, like "aB0" -- although not "abc". NOTE the parentheses around the patterns, they come in handy when grouping sections of a Regular Expression together.

Choice is very important when building Regular Expressions -- as in most other languages, it's possible to use the pipe [|] operator to indicate multiple options in a RegExp. For example,

/dos|two|zwei/

would match any one of the three strings "dos", "two" and "zwei". This obviously comes useful when building expressions that have many possible variants.

You can also invert the regular sense of a Regular Expression with the negation operator, represented by ^. So,

/[^A-C]/

would match everything but that which appears in the expression -- namely, everything except the letters "A", "B" and "C".

NOTE: when ^ is used in a bracketed expression it is used to invert the match. When ^ it is used outside a bracketed expression it serves as a pattern anchor.

And finally, one important thing to remember when you add any of the meta-characters described above to your pattern and explicitly match them, you need to "escape" then with a back slash [\]. So, the pattern

/Th\*/

would match "Th*" but not "The" -- the \* ensures that the asterisk is matched as a literal character, not a meta-character.

Examples of RegExp "Group Matching":

var reg1 = /[lmw]ink/;    // matches "link", "mink" and "wink"
var reg2 = /[^lmw]ink/;    // matches "dink", "fink", "pink", etc..., but not "link", "mink" or "wink"
var reg3 = /[a-s]ink/;    // matches "link", "mink", etc ..., but not "wink"
var reg4 = /[^t-z]ink/;    // matches "link", "mink", etc ..., but not "wink"

var reg1 = /^([1-9]|1[0-2]):[0-5]\d$/;
  
   // matches proper time values
var reg2 = /['"]\d\d\d['"]/;    // matches a three digit number in quotes