Regex and php question, need non-greedy search!_问答_开发者

Regex and php question, need non-greedy search!

开发者 https://www.devze.com 2023-02-18 15:55 出处：网络

I am having trouble trying to write a non-greedy regex statement. Here is my string: nameaddressmailto:blabla@email.com

I am having trouble trying to write a non-greedy regex statement.

Here is my string:

<strong>name</strong><strong>address</strong>mailto:blabla@email.com

Here is my regex query:

<st开发者_如何学编程rong>(.*?)</strong>.*?([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})

The problem is that I need the the address, not the name from the string. So I need the regex query to be non-greedy and take the closest  instead of the farthest away.

There are also multiple instances of this in my search string, so it would have to match multiple instances at a time instead of just adding a .* (greedy) thing in front of it.

So it would have to match all the instances of this, and pull the addresses, not names:

   <strong>name</strong><strong>address1</strong>mailto:blabla@email.com
   <strong>name</strong><strong>address2</strong>mailto:blabla@email.com
   <strong>name</strong><strong>address3</strong>mailto:blabla@email.com
   <strong>name</strong><strong>address4</strong>mailto:blabla@email.com

Thanks in advance!

First, regular expressions are a suboptimal tool for matching HTML (this being a good example why this is so). You'll be happier with a parser if you know how to use one (maybe one of the PHP gurus can recommend one).

Having said that, a better way with regexes would probably be to match (and discard) the first  tag explicitly:

<strong>.*?</strong><strong>(.*?)</strong>.*?([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})

This is by no means a good, reliable, bulletproof solution, but at least it works for your sample data.

Or, if you can be more specific about what's allowed between/after the relevant tag, how about this:

<strong>([^<>]*)</strong>(?:mailto:)?([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})

Looking at your test data, here are the rules I infer: If...

Name and Address are both wrapped in STRONG elements and the email follows immediately, AND
The STRONG elements' attributes, the name and the addresses all have no angle brackets, AND
The email address component always begins with mailto:, AND
There are no other HTML elements within the two STRONG elements,

Then this tested code should do the trick:

$re = '%
    # Capture name and address in <strong> element then email.
    <strong[^>]*>\s*([^<>]+)</strong\s*>\s*  # $1: Name.
    <strong[^>]*>\s*([^<>]+)</strong\s*>\s*  # $2: Address.
    (mailto:\S+)                             # $3: Email.
    %ix';
$count = preg_match_all($re, $text, $matches);
if ($count) {
    printf("%d matches found:\n", $count);
    print_r($matches);
    for ($i = 0; $i < $count; ++$i) {
        printf("Match %d: Name: \"%s\", Address: \"%s\", Email: \"%s\":\n",
            $i + 1, $matches[1][$i], $matches[2][$i], $matches[3][$i]);
    }
} else {
    printf("No matches found.\n");
}

Don't use regular expressions for parsing HTML.

See http://htmlparsing.com/php.html