There seems to be no documentation on the codeplex page and for 开发者_StackOverflowsome reason intellisense doesn't show me available methods or anything at all for htmlagilitypack (for example when I type MyHtmlDocument.DocumentNode. - there is no intellisense to tell me what I can do next)
I need to know how to remove ALL < a > tags and their content from the body of the HTML document I cannot just use Node.InnerText on the Body because that still returns content from A tags.
Here is example HTML
<html>
<body>
I was born in <a name=BC>Toronto</a> and now I live in barrie
</body>
</html>
I need to return
I was born in and now I live in barrie
Thanks, I appreciate the help!
Thomas
Something along the lines of (sorry my code is C# but I hope it will help nonetheless)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("some html markup here");
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@name]");
foreach(HtmlNode link in links)
{
link.Remove();
}
//then one of the many doc.Save(...) overrides to actually get the result of the operation.
This gets you the result you require. This uses Recursive method to drill down all your html nodes and you can simply remove more nodes by adding a new if statment.
Public Sub Test()
Dim document = New HtmlDocument() With { _
Key .OptionOutputAsXml = True _
}
document.LoadHtml("<html><body>I was born in <a name=BC>Toronto</a> and now I live in barrie</body></html>")
For i As var = 0 To document.DocumentNode.ChildNodes.Count - 1
RecursiveMethod(document.DocumentNode.ChildNodes(i))
Next
Console.Out.WriteLine(document.DocumentNode.InnerHtml.Replace(" ", " "))
End Sub
Public Sub RecursiveMethod(child As HtmlNode)
For x As var = 0 To child.ChildNodes.Count - 1
Dim node = child.ChildNodes(x)
If node.Name = "a" Then
node.RemoveAll() //removes all the child nodes of "a"
node.Remove() //removes the actual "a" node
Else
If node.HasChildNodes Then
RecursiveMethod(node)
End If
End If
Next
End Sub
精彩评论