开发者

How do i match content between particular all <li> tags?

开发者 https://www.devze.com 2023-03-11 06:03 出处:网络
How do I match all the <li> tags in the below HTML code: <ul> <li> some content</li>

How do I match all the <li> tags in the below HTML code:

<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>

This expression doesn't work:

<li>(.*)</li>

Because it returns:

some content</li>
    <li> some other content</li>
    <li> some other other content.

W开发者_C百科hich is the content between the first <li> and the last </li>


Regular expressions are greedy by nature. Make it non-greedy by adding the ?.

<li>(.*?)</li>

Note: I'd encourage a DOM Parser for such a thing. Check out PHP's DOMDocument.


Someone please link the Regex HTML Parser question...

There is a reason HTML parsers exist, which is to parse HTML.

This solution is a bit long, but it is versatile and works for elements with classes, ids, etc:

<?php

function innerHTML($node) {
  $doc = new DOMDocument();

  foreach ($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
  }

  return $doc->saveHTML();
}

$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";

$document = new DOMDocument();
$document->loadHTML($string);

$ul = $document->getElementsByTagName("ul");

foreach ($ul as $element) {
  print innerHTML($element);
}

?>

It seems like you don't need the tag names. Try this simpler code:

<?php

$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";

$document = new DOMDocument();
$document->loadHTML($string);

$ul = $document->getElementsByTagName("li");

foreach ($ul as $element) {
  print $element->nodeValue;
}

?>


Try to use .*? rather than .* - it is lazy or non-greedy match and matches as little as possible.

Response to @CanSpice:

Of course regex is not suited for HTML. OP should try something like <li>(?!.*<li>).*?</li> depending on what he is doing. OR rather use a parser. I can only direct the OP one step at a time


Try to make the Regexp non-greedy

<li>(.*?)</li>


Since you are matching HTML text I would suggest atleast use s and i flags like this:

'~<li>(.*?)</li>~is'
  • s is for DOTALL to make dot . match all the characters including new line
  • i is for ignore case matching


<?php
$str = '<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>';

preg_match_all('/<li>([^<]+)</li>/i', $str, $r); print_r($r[1]); ?>

Output:

`Array
(
    [0] =>  some content
    [1] =>  some other content
    [2] =>  some other other content.
)
`


var a = '<ul>'+
'<li> some content</li>'+
'<li> some other content</li>'+
'<li> some other other content.</li>'+
'</ul>'

a.split("<li>") 
gives
["<ul>", " some content</li>", " some other content</li>", " some other other content.</li></ul>"]

From there we can pick whatever we want.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号