开发者

getting only subsequent siblings of same type using xpath and simplexml

开发者 https://www.devze.com 2022-12-14 23:43 出处:网络
I need to parse a html definition list like the following: <dl> <dt>stuff&l开发者_如何转开发t;/dt>

I need to parse a html definition list like the following:

<dl>
    <dt>stuff&l开发者_如何转开发t;/dt>
        <dd>junk</dd>
        <dd>things</dd>
        <dd>whatnot</dd>
    <dt>colors</dt>
        <dd>red</dd>
        <dd>green</dd>
        <dd>blue</dd>
</dl>

So that I can end up with an associative array like this:

[definition list] =>
    [stuff] =>
        [0] => junk
        [1] => things
        [2] => whatnot
    [colors] =>
        [0] => red
        [1] => green
        [2] => blue

I am using DOMDocument -> loadHTML() to import the HTML string into an object and then simplexml_import_dom() to use the simplexml extensions, specifically xpath.

The problem I'm having is with the XPath syntax for querying all <dd> elements that are consecutive and not broken by a <dt>.

Since <dd> elements are not considered children of <dt> elements, I can't simply loop through a query all dts and query for all dds.

So I'm thinking I have to do a query for the first dd sibling of each dt and then all dd siblings of that first dd.

But I'm not clear from the XPath tutorials if this is possible. Can you say "consecutive matching siblings"? Or am I forced to loop through each child of the original dl and move over any dts and dd as they show up?


There are certainly ways to find consecutive matching siblings in XPath, but it would be relatively complicated and since you have to process every child anyway you might as well just loop over them as you mentioned. It will be simpler and more efficient than looping over <dt/> then looking for siblings.

$dl = simplexml_load_string(
    '<dl>
        <dt>stuff</dt>
            <dd>junk</dd>
            <dd>things</dd>
            <dd>whatnot</dd>
        <dt>colors</dt>
            <dd>red</dd>
            <dd>green</dd>
            <dd>blue</dd>
    </dl>'
);

$list = array();
foreach ($dl->children() as $child)
{
    switch (dom_import_simplexml($child)->localName)
    {
        case 'dt':
            $k = (string) $child;
            break;

        case 'dd':
            $list[$k][] = (string) $child;
            break;
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号