开发者

How to match string, which does NOT contain a word?

开发者 https://www.devze.com 2023-03-01 21:56 出处:网络
To match string, which contains some word, I can use pattern \"/.*word.*/\". But how do I match a string, which does not contain this word?

To match string, which contains some word, I can use pattern "/.*word.*/". But how do I match a string, which does not contain this word?

Example:

I need to find a substring in a big text, which is enclosed by two tags, and , and has some string like "Hello" inside. The best I came up with:

"@<div>(.*?Hello.?*)</div>@i"

But it will also match the sequence:

<div>Bye.</div><div>Hello!</div>

And I do not want to match the first pair开发者_运维问答 of div tags - thus I want to replace ".*?" with something like "match any string, except which does not contain ".

Test case:

For input string:

<div>Bye.</div><div>Hello!</div>

I need to catch

<div>Hello!</div>


A better title for the question might be: "Match a DIV element containing a specific sub-string." First it must be said that regex is not the best tool for this job. It would be much better to use an HTML parser to parse the markup, then search the contents of each DIV element for the desired sub-string. That said, since you wan't to know more about how to use regex to match stuff that is not something else, the following describes a limited way of doing this with a regex.

As Dogbert correctly points out, this question really is a duplicate of Regular expression to match string not containing a word?. However, I see that you have looked at that question but need to know how to apply this technique to a subpattern.

To match a part of a string (sub-pattern) which does not include a specific word (or words), you need to apply a negative lookahead assertion check before each and every character. Here is how you would do it for the text between opening and closing DIV tags. Note that when using only a single regex, because DIV elements may be nested, it is only reasonable to find "HELLO" within the "innermost" of nested DIV elements.

Pseudo code:

  • Match the opening DIV tag.
  • Lazily match zero or more characters, each of which is not the beginning of <div or </div.
  • Once the desired string: "HELLO" is found, go ahead and match it.
  • Continue (greedily) matching zero or more characters, each of which is not the beginning of <div or </div.
  • Match the closing </div> tag.

Note that to match only the "innermost" DIV contents, it is necessary to exclude both <DIV and </DIV while scanning the element's contents one char at a time. Here is the corresponding regex in the form of a tested PHP function:

// Find an innermost DIV element containing the string "HELLO".
function p1($text) {
    $re = '% # Match innermost DIV element containing "HELLO"
        <div[^>]*>        # DIV element start tag.
        (?:               # Group to match contents up to "HELLO".
          (?!</?div\b)    # Assert this char is not start of DIV tag.
          .               # Safe to match this non-DIV-tag char.
        )*?               # Lazily match contents one chara at a time.
        \bhello\b         # Match target "HELLO" word inside DIV.
        (?:               # Group to match content following "HELLO".
          (?!</?div\b)    # Assert this char is not start of DIV tag.
          .               # Safe to match this non-DIV-tag char.
        )*                # Greedily match contents one chara at a time.
        </div>            # DIV element end tag.
        %six';
    if (preg_match($re, $text, $matches)) {
        // Match found.
        return $matches[0];
    } else {
        // No match found
        return 'no-match';
    }
}

This function will correctly match the desired DIV element of your following test data:

<div>Bye.</div><div>Hello!</div>

It will also correctly find "HELLO" within the innermost of nested DIV elements:

<div>
    <div>
        Hello world!
    </div>
</div>

But, as stated earlier, it will NOT find the "HELLO" string located within non-innermost nested DIV elements like so:

<div>
    Hello,
    <div>
        world!
    </div>
</div>

To do this is a much more complicated solution.

There are lots of cases where this solution can fail. Once again. I recommend using an HTML parser.


'~<div>(?!.*?Bye\..*?</div>).+?</div>~'


Can't you just check for if you didn't get a match?

If you're looking for anything but the word "word":

if(!preg_match("/word/i", $myString))

This will run code underneath the if only if "word" was not found.

0

精彩评论

暂无评论...
验证码 换一张
取 消