开发者

Removing element by class name with HTMLAgilityPack c#

开发者 https://www.devze.com 2023-02-15 10:51 出处:网络
I\'m using the html agility pack to read the contents of my html document into a string etc. After this is done, I would like to remove certian elements in that content by their class, however I am st

I'm using the html agility pack to read the contents of my html document into a string etc. After this is done, I would like to remove certian elements in that content by their class, however I am stumbling upon a problem.

My Html looks like this:

<div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs">
            </div>
        </div>

        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>

Content goes here...
</div>

Now, I have used an xpath selector to get all the content within the and used the InnerHtml property like so:

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

From this point, I would like to remove the div with the class of "breadCrumbContainer", however when using the code below, I get the error: "Node "" was not found in the collection"

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            node = node.RemoveChild(node.SelectS开发者_Python百科ingleNode("//div[@class='breadCrumbContainer']"));

            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

Can anyone shed some light on this please? I'm quite new to Xpath, and really new to the HtmlAgility library.

Thanks,

Dave


It's because RemoveChild can only remove a direct child, not a grand child. Try this instead:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
    node.ParentNode.RemoveChild(node);


This is a super-simple task for XSLT:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "div[@class='breadCrumbContainer'
     and
       ancestor::div[@id='wrapper']
      ]
  "/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document (with added another <div> and wrapped into an <html> top element to make it more challenging and realistic):

<html>
 <div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs"></div>
        </div>
        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>  Content goes here...
    </div>
 </div>
 <div>
   Something else here
 </div>
</html>

the wanted, correct result is produced:

<html>
  <div id="wrapper">
    <div class="maincolumn">
      <div class="seo_list">
        <div class="seo_head">Header</div>
      </div>  Content goes here...
    </div>
  </div>
  <div>
   Something else here
 </div>
</html>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号