Get last <li> element from a string_问答_开发者

开发者 https://www.devze.com 2022-12-18 01:16 出处：网络

I have a string variable that contains a lot of HTML markup and I want to get the last <li> element from it.

相关专题：php

I have a string variable that contains a lot of HTML markup and I want to get the last <li> element from it. Im using something like:

$markup = "<body><div><li id='first'>One</li><li id='second'>Two</li><li id='third'>Three</li></div></body>";

preg_match('#<li(.*?)>(.*)</li>#ims', $markup, $matches);
$lis = "<li ".$matches[1].">".$matches[2]."</li>";
$total = explode("</li>",$lis);
$num = count($total)-2;
echo $total[$num]."</li>";

This works and I get the last <li> element printed. But I cant understand why I have to subtract the last 2 indexes of the array $total. Normally I would only subtract the last index since counting starts on index 0. What im i missing?

Is there a better way of getting the last <li> 开发者_开发知识库element from the string?

HTML is not regular, and so can't be parsed with a regular expression. Use a proper HTML parser.

@OP, your requirement looks simple, so no need for parsers or regex.

$markup = "<body><div><li id='first'>One</li><li id='second'>Two</li><li id='third'>Three</li></div></body>";
$s = explode("</li>",$markup,-1);
$t = explode(">",end($s));
print end($t);

output

$ php test.php
Three

If you already know how to use jQuery, you could also take a look at phpQuery. It's a PHP library that allows you to easily access dom elements, just like in jQuery.

From the PHP.net documentation:

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

$matches[0] is the complete match (not just the captured bits)

You have to extract the second index because you have 2 capturing groupds:

$matches[0]; // Contains your original string
$matches[1]; // Contains the argument for the LI start-tag (.*?)
$matches[2]; // Contains the string contained by the LI tags (.*)

'parsing' (x)html strings is with regular expressions is hard and can be full of unexpected problems. parsing more than simple tagged strings is not possible because (x)html is not a regular language.

you could improve your regex by using (not tested):