开发者

XML/XHTML replace content?

开发者 https://www.devze.com 2022-12-31 03:57 出处:网络
I have a XHTML string I want to replace tags in for example <span tag=\"x\">FOO</span> <span tag=\"y\"> <b>bar</b&开发者_StackOverflow社区gt; some random text <span>

I have a XHTML string I want to replace tags in for example

<span tag="x">FOO</span> 
<span tag="y"> <b>bar</b&开发者_StackOverflow社区gt; some random text <span>another span</span> </span>

I want to be able to find tag="x" and replace FOO with my own content and find tag=y and replace all the inner content with by own content.

What is the best way to do this? I am thinking regex is definitely out of the question. Can XPATH do this or is that just for searching can it do manipulation?


If you're sure the content is XHTML (i.e. well-formed XML) then XPath can certainly do it.

var doc = new XmlDocument();
doc.LoadXml("<span tag=...");

foreach(var node in doc.SelectNodes("//span[tag=x]"))
{
    node.InnerXml = "New Content";
}
foreach(var node in doc.SelectNodes("//span[tag=y]"))
{
    node.InnerXml = "Different Content";
}


You can surely do this using regular expressions (it is a string manipulation afterall), but that may get a bit nasty, because HTML can be quite complicated. However, it is certainly a possible approach.

An alternative would be to parse the XHTML page into some structured hieararchy and then do the processing. The question is whether the pages are really valid XML. The XHTML specification requires that, but if you'll pick random page from the internet that claims to be XHTML, you may run into troubles.

  • If no, then you need to parse them as HTML, which can be done using Html Agility Pack.
  • If yes, then you can treat it as XML and use standard .NET classes to parse it.

The second case could be done using LINQ to XML like this:

var xs = from span in doc.Descendant("span")
         let tag = span.Attribute("tag")
         where tag != null && tag.Value == "x" select span;
forach(var x in xs) x.Value = "BAR!";

The obvious benefit is that this is much more readable and maintainable than a solution that would use regular expressions. Html Agility Pack provides a similar API (although I'm not familiar with it to write a sample).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号