开发者

Find multiple patterns with a single preg_match_all in PHP

开发者 https://www.devze.com 2023-02-01 20:12 出处:网络
Using PHP and preg_match_all I\'m trying to get all the HTML content between the following tags (and the tags also):

Using PHP and preg_match_all I'm trying to get all the HTML content between the following tags (and the tags also):

<p>paragraph text</p>
don't take this
<ul><li>item 1</li><li>item 2</li></ul>
don't take this
<table><tr><td>table content</td></tr></table>

I can get one of them just fine:

preg_match_all("(<p>(.*)</p>)siU", $content, $matches, PREG_SET_ORDER);

Is there a way to get all the

<p></p> <ul></ul> <table></table>

content with a single preg_match_all? I need them to come out in the order they were found so I can echo the content and it will make sense.

So if I did a preg_match_all on the above content then iterated through the $matches array it wo开发者_Go百科uld echo:

<p>paragraph text</p>
<ul><li>item 1</li><li>item 2</li></ul>
<table><tr><td>table content</td></tr></table>


Use | to match one of a group of strings: p|ul|table

Use backreferences to match the approriate closing tag: \\2 because the group (pl|ul|table) includes the second opening parenthesis

Putting that all together:

preg_match_all("(<(p|ul|table)>(.*)</\\2>)siU", $content, $matches, PREG_SET_ORDER);

This is only going to work if your input html follows a very strict structure. It cannot have spaces in the tags, or have any attributes in tags. It also fails when there's any nesting. Consider using an html parser to do a proper job.


This one work for me

preg_match_all("#<\b(p|ul|table)\b[^>]*>(.*?)</\b(p|ul|table)\b>#si", $content, $matches)


If you are to use a DOM parser, and you should, here's how. A contributor posted a useful function for obtaining a DOMNode's innerHTML, which I will use in the following example:

$dom = new DOMDocument;
$dom->loadHTML($html);

$p = $dom->getElementsByTagName('p')->item(0); // first <p> node
$ul = $dom->getElementsByTagName('ul')->item(0); // first <ul> node
$table = $dom->getElementsByTagName('table')->item(0); // first <table> node

echo DOMinnerHTML($p);
echo DOMinnerHTML($ul);
echo DOMinnerHTML($table);


While doable with regular expressions, you could simplify the task by using one of the simpler HTML parser toolkits. For example with phpQuery or QueryPath it's as simple as:

qp($html)->find("p, ul, table")->text();   // or loop over them
0

精彩评论

暂无评论...
验证码 换一张
取 消