开发者

PHP ->preg_match_all for following structure <h6>my headline</h6>some text ... <h6>another headline</h6> more text

开发者 https://www.devze.com 2023-01-28 13:52 出处:网络
I\'m desperate looking for the solution to get this text string <h6>First pane</h6> ... pane content ...

I'm desperate looking for the solution to get this text string

<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ...

parsed into an PHP array.

I need to seperate it to

1.
1.0=> First pane
1.1=> ... pane content ... 

2.
2.0=> Second pane
2.1=> Hi, this is a c开发者_如何学Goomment.
    To delete a comment, just log in and view the post's comments.
    There you will have the option to edit
    or delete them.

3.
3.0=> Last pane
3.1=> ... last pane content ...


Your regex should look like this:

/<h6>([^<]+)<\/h6>([^<]+)/im

If you run the following script, you'll see that the values you're looking for are in $matches[1] and $matches[2].

$s = "<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ..";
$r = "/<h6>([^<]+)<\/h6>([^<]+)/im";

$matches = array();
preg_match_all($r,$s,$matches);

print_r($matches);


You shouldn't be attempting to parse HTML with a regex. This is doomed to cause much pain and unhappiness for all but the very simplest HTML, and will instantly break if anything in your doc structure changes. Use a proper HTML or DOM parser instead, such as php's DOMDocument http://php.net/manual/en/class.domdocument.php

For example you can use getElementsByTagName http://www.php.net/manual/en/domdocument.getelementsbytagname.php to get all h6's


I believe the PREG_SET_ORDER flag is what you're looking for.

$regex = '~<h6>([^<]+)</h6>\s*([^<]+)~i';

preg_match_all($regex, $source, $matches, PREG_SET_ORDER);

This way, each element in the $matches array is an array containing the overall match followed by all of the group captures for a single match attempt. The result up to the first match looks like this:

Array
(
    [0] => Array
        (
            [0] => First pane
... pane content ...

            [1] => First pane
            [2] => ... pane content ...

        )

see it in action on ideone

EDIT: Notice the \s* I added, too. Without that, the matched content always starts without a line separator.

0

精彩评论

暂无评论...
验证码 换一张
取 消