Character classes

 

Character classes are an important component of regular expression. It is used for specifying which characters are acceptable at particular point or which are not. With character classes, you can specify characters individually or give a range of allowable characters. More over with the character classes you can negate the characters which are not acceptable. Some of the character classes are given below:

1. Simple Classes ( [  ] ):~

The most basic form of a character class is to place a set of characters side-by-side within square brackets. For example, the regular expression “[bcr]at” will match the words "bat", "cat", or "rat" because it defines a character class (accepting either "b", "c", or "r") as its first character. Here “[bcr]” is a simple character class.

 

Enter your regex: [bcr]at

Enter input string to search: bat

I found the text "bat" starting at index 0 and ending at index 3.

 

Enter your regex: [bcr]at

Enter input string to search: cat

I found the text "cat" starting at index 0 and ending at index 3.

 

Enter your regex: [bcr]at

Enter input string to search: rat

I found the text "rat" starting at index 0 and ending at index 3.

 

Enter your regex: [bcr]at

Enter input string to search: hat

No match found.

In the above examples, the overall match succeeds only when the first letter matches one of the characters defined by the character class.

 

2. Negation ( ^ ):~

One of the other most important character class which is widely used is the negation character class. It is used to match all characters except those listed, insert the "^" metacharacter (called leading caret) at the beginning of the character class. This technique is known as negation. In the given regular expression, the “[^bcr]” is a character class.

 

Enter your regex: [^bcr]at

Enter input string to search: bat

No match found.

 

Enter your regex: [^bcr]at

Enter input string to search: cat

No match found.

 

Enter your regex: [^bcr]at

Enter input string to search: rat

No match found.

 

Enter your regex: [^bcr]at

Enter input string to search: hat

I found the text "hat" starting at index 0 and ending at index 3.

In the given example I apply the negation on “b”, “c” and “r”. The regular expression engine matches all the character except those which is started from above three and displays on the screen. For example when I enter the search string “bat”, “cat” and “rat” the regular expression engine does not find it and when I enter “hat”, it shows that the character found.

 

3.  Ranges ( - ):~

Sometimes you'll want to define a character class that includes a range of values, such as the letters "a through h" or the numbers "1 through 5". To specify a range, simply insert the "-" metacharacter between the first and last character to be matched, such as [1-5] or [a-h]. You can also place different ranges beside each other within the class to further expand the match possibilities. For example, [a-zA-Z] will match any letter of the alphabet: a to z (lowercase) or A to Z (uppercase).

Here are some examples of ranges and negation:

Enter your regex: [a-c]

Enter input string to search: a

I found the text "a" starting at index 0 and ending at index 1.

 

Enter your regex: [a-c]

Enter input string to search: b

I found the text "b" starting at index 0 and ending at index 1.

 

Enter your regex: [a-c]

Enter input string to search: c

I found the text "c" starting at index 0 and ending at index 1.

 

Enter your regex: [a-c]

Enter input string to search: d

No match found.

 

Enter your regex: foo[1-5]

Enter input string to search: foo1

I found the text "foo1" starting at index 0 and ending at index 4.

 

Enter your regex: foo[1-5]

Enter input string to search: foo5

I found the text "foo5" starting at index 0 and ending at index 4.

 

Enter your regex: foo[1-5]

Enter input string to search: foo6

No match found.

 

Enter your regex: foo[^1-5]

Enter input string to search: foo1

No match found.

 

Enter your regex: foo[^1-5]

Enter input string to search: foo6

I found the text "foo6" starting at index 0 and ending at index 4.

 

4.  Union ( [  ][  ] ):~

You can, however, combine character classes to form new types of patterns. For instance the following regular expression

Letters from 19[89][2-5].

With this pattern, any year whose third digit is an 8 or 9 and the final digit between 2 and 5, inclusive, will be matched. Thus, these are the potential matches for the previous regular expression pattern:

Letter from 1982

Letter from 1983

Letter from 1984

Letter from 1985

Letter from 1992

Letter from 1995

As you can see from the output that the third digit of the year is lie between 8 and 9 and the final digit lie between 2 to 5.

Some of the other short method of character classes are given below:

 

Sr.       No Symbol          Function

1.            \d                      Any digit [0 – 9]

2.             \D                    Any non digit [^0-9]

3.             \w                    Any alphanumeric [a-zA-Z0-9_]

4.            \W                    Any non-alphanumeric [^a-zA-Z0-9_]

5.             \s                     Any space [ \t\n\r\f]

6.             \S                     Any non-space [^ \t\n\r\f]

 

For example this can be used as

Enter your regular expression [\d]

Enter input string to search: 1

I found the text "1" starting at index 0 and ending at index 1.

 

And so on.

 


Like it on Facebook, Tweet it or share this article on other bookmarking websites.

No comments