开发者

HTML DOM: How to get elements without losing children?

开发者 https://www.devze.com 2023-02-17 05:48 出处:网络
I\'m trying to perform a preg_replace on the text in an HTML string. I want to avoid replacing the text within tags, so I\'m loading the string as a DOM element and grabbing the text within each node.

I'm trying to perform a preg_replace on the text in an HTML string. I want to avoid replacing the text within tags, so I'm loading the string as a DOM element and grabbing the text within each node. For example, I have this list:

<ul>
<li><a href="?p=oconnorinv&i=1">Boxes 1-3</a>: 1925 - 1928 <em>(A-Ma)</em></li>
<li><a href="?p=oconnorinv&i=2">Boxes 4-6</a>: 1928 <em>(Mb-Z)</em> - 1930 <em>(A-Wi)</em></li>
<li><a href="?p=oconnorinv&i=3">Boxes 7-9</a>: 1930 <em>(Wo-Z)</em>- 1932 <em>(A-Fl)</em></li>
</ul>

I want to be able to highlight the character "1", or the letter "i", without disturbing the links or list item tag. So I grab each list item and get its value to perform the replace on:

$invfile = [string of the unordered list above]
$invco开发者_JS百科ntents = new DOMDocument;
$invcontents->loadHTML($invfile);
$inv_listitems = $invcontents->getElementsByTagName('li');
    foreach ($inv_listitems as $f) {
            $f->nodeValue = preg_replace($to_highlight, "<span class=\"highlight\">$0</span>", $f->nodeValue);
        }
    echo html_entity_decode($invcontents->saveHTML());

The problem is, when I grab the node values, the child nodes inside the list item are lost. If I print out the original string as-is, the < a >, < em >, etc. tags are all there. But when I run the script, it prints out without the links or any formatting tags. For example, if my $to_replace is the string "Boxes", the list becomes:

<ul>
<li><span class="highlight">Boxes</span> 1-3: 1925 - 1928 (A-Ma)</li>
<li><span class="highlight">Boxes</span> 4-6: 1928 (Mb-Z) - 1930 (A-Wi)</li>
<li><span class="highlight">Boxes</span> 7-9: 1930 (Wo-Z)- 1932 (A-Fl)</li>
</ul>

How can I get the text without losing the tags inside?


The problem here is that you're operating on the entire

  • element. Boxes is part of the nodeValue of an anchor tag.

    If the structure above is always the same you can do something like

    $new_html = preg_replace("##", "", $f->item(0)->nodeValue);

    In reality, the best way to go about it is to unset the anchor's node value and create an entirely new element and append it.

    (Consider this psuedo code)

    $inv_listitems = $invcontents->getElementsByTagName('li');
    foreach ($inv_listitems as $f) {
            $span = $invcontents->createElement("span");
            $span->setAttribute("class", "highlight");
            $span->nodeValue = $f->item(0)->nodeValue;
            $f->appendChild($span);
        }
    echo $invcontents->saveHTML();
    

    You'll have to do some matching in there, as well as unsetting the nodeValue of $f but hopefully this makes it a little more clear.

    Also, don't set HTML in nodeValue directly, because it will run htmlentities() against all of the html you set. That is why I create a new element above. If you absolutely have to set HTML in nodeValue then you should create a DocumentFragment Object


    YOu're better of operating only on the textnodes:

    $x  = new DOMXPath(invcontents);
    foreach($x->query('//li/text()' as $textnode){
        //replace text node with list of plain text nodes & your highlighting span.
    }
    


    I always use xpath for this kind of actions. It'll give you more flexibility. This example handles

    <mainlevel>
      <toplevel>
        <detaillevel key=...>
          <xmlvalue1></xmlvalue1>
          <xmlvalue1></xmlvalue2>
    
          <sublevel key=...>
            <xmlvalue1></xmlsubvalue1>
            <xmlvalue1></xmlsubvalue2>
          </sublevel>
    
        </detaillevel>
      </toplevel>
    </mainlevel>
    

    To parse this:

    $xpath = new DOMXPath($xmlDoc);
    $mainNodes = $xpath->query("/mainlevel/toplevel/detaillevel");
    
    foreach( $mainNodes as $subNode ) { 
        $parameter1=$subNode->getAttribute('key');
        $parameter2=$subNode->getElementsByTagName("xmlvalue1")->item(0)->nodeValue;
        $parameter3=$subNode->getElementsByTagName("xmlvalue2")->item(0)->nodeValue;
    
        foreach ($subNode->getElementsByTagName("sublevel") as $detailNode) {
            $parameter1=$detailNode->getAttribute('key');
            $parameter2=$detailNode->getAttribute('xmlsubvalue1');
            $parameter2=$detailNode->getAttribute('xmlsubvalue2');
    
            }
        }
    
  • 0

    精彩评论

    暂无评论...
    验证码 换一张
    取 消

    关注公众号