Pages

8/21/2007

Perl::Regular Expression::Matching Expression Variables



Match Variables
If a =~ match expression is true, the special variables $1, $2, ... will be the substrings that matched parts of the pattern in parenthesis -- $1 matches the first left parenthesis, $2 the second left parenthesis, and so on. The following pattern picks out three words separated by whitespace...

if ("this and that" =~ /(\w+)\s+(\w+)\s+(\w+)/) {

## if the above matches, $1=="this", $2=="and", $3=="that"

This is a nice way to parse a string -- write a regular expression for the pattern you expect putting parenthesis around the parts you want to pull out. Only use $1, $2, etc. when the if =~ returns true. Other regular-expression systems use \1 and \2 instead of $1 $2, and Perl supports that syntax as well. There are three other special variables: $& (dollar-ampersand) = the matched string, $` (dollar-back-quote) = the string before what was matched, and $' (dollar-quote) = the string following what was matched.

The following loop rips through a string and pulls out all the email addresses. It demonstrates using a character class, using $1 etc. to pull out parts of the match string, and using $' after the match.

$str = 'blah blah nick@cs.stanford.edu, blah blah balh billg@microsoft.com blah blah';

while ($str =~ /(([\w._-]+)\@([\w._-]+))/) { ## look for an email addr
print "user:$2 host:$3 all:$1\n"; ## parts of the addr
$str = $'; ## set the str to be the "rest" of the string
}

output:
user:nick host:cs.stanford.edu all:nick@cs.stanford.edu
user:billg host:microsoft.com all:billg@microsoft.com


Thanks to: http://cslibrary.stanford.edu/108/EssentialPerl.html

No comments: