开发者

Regex match all newline characters within <p> in PHP

开发者 https://www.devze.com 2023-01-03 00:11 出处:网络
I have the following sample set of data: <p>first line\\n second line\\n third line\\n</p> first line\\n

I have the following sample set of data:

<p>first line\n
second line\n
third line\n</p>
first line\n
second line\n
third line\n

Using regex, how c开发者_开发问答ould I match on the newline characters, but only when they are within the paragraph tags.

This code would be used within php.


You could split this in two regex's. First split on your <p> tags (<p>.*?</p>) , then match on newline from the result.

Divide and conquer. Several small regex's will often perform faster than huge ones.

I assume you have total control over the html and know it's well formed. Because using regex on html is a no-no in most cases. Use a DOM parser instead.


Well, regex are not well suited to parsing HTML (use DomDocument for that). You also said that you want to "match on". Does that mean capture? Replace? "Check for"? Assuming check for, here's a crude one:

$regex = '#(?i:<p[^>]*>[^\\n]*)(\\n)(?i:[^<]*</p>)#';

It won't match <p><i>foo\n</i></p>, but it will match the case where there is a new line inside of a basic <p> tag (with no html children).

What I'd suggest, is grabbing DomDocument, and doing something like this:

$dom = new DomDocument();
$dom->loadHTML($html);
$pTags = $dom->getElementsByTagName('p');
foreach ($pTags as $p) { 
    $txt = $p->textContent;
    if (strpos($txt, "\n") !== false) {
        //You found a \n within a P tag
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消