开发者

Using HtmlAgility to Group Elements Together

开发者 https://www.devze.com 2023-03-31 19:29 出处:网络
I\'m trying to take an HTML document and group it into sections base on header tags using HTML Agility

I'm trying to take an HTML document and group it into sections base on header tags using HTML Agility Here's what the raw HTML looks like

<h3>Header 1</h3>
<p>Text...</p>
<p>More Text...</p>
<h3Header 2</h3>
<p>Text...</p>
<p>More Text...</p>
<p>Even more Text...</p>
<h3>Header 3</h3>
<p>Some Text...</p>

and I want to have it end up something like this after I group it

<div id="header_1">
  <h3>Header 1</h3>
  <p>Text...</p>
  <p>More Text...</p>
</div>

<div id="header_2">
  <h3Header 2</h3>
  <p>Text...</p>
  <p>More Text...</p>
  <p>Even more Text...</p>
</div>

<div id="header_3">
  <h3>Header 3</h3>
  <p>Some Text...</p>
</div>

or like this

<h3>Header 1</h3>
<div id="header_1">
  <h3>Header 1</h3>
  <p>Text...</p>
  <p>More Text...</p>
</div>


<h3Header 2</h3>
<div id="header_2">
  <p>Text...</p>
  <p>More Text...</p>
  <p>Even more Text...</p>
</div>

<h3>Header 3</h3>
<div id="header_3">
  <p>Some Text...</p>
</div>

HTML Agility is great, but i开发者_StackOverflowf anyone knows another way to accomplish this, that would be awesome!


It's quite easy could be done with AgilityPack. First you need to get all the top <h3>s, create a <div> before (or after) each <h3>, then iterate through the following siblings of the current <h3> until the next <h3> or end of siblings found, and finally move these nodes into newly created <div>:

var h3s = doc.DocumentNode.SelectNodes("h3");
var idx = 1;
foreach (var h3 in h3s)
{
    var div = HtmlNode.CreateNode(string.Format("<div id='header_{0}'></div>", idx++));
    h3.ParentNode.InsertBefore(div, h3);
    var group = new List<HtmlNode> { h3 };

    for (var next = h3.NextSibling; next != null && next.Name != "h3"; next = next.NextSibling)
        group.Add(next);

    foreach (var item in group)
    {
        item.Remove();
        div.AppendChild(item);
    }
}

This will get you something like (I corrected <h3Header 2</h3> from your source):

<div id='header_1'>
  <h3>Header 1</h3>
  <p>Text...</p>
  <p>More Text...</p>
</div>
<div id='header_2'>
  <h3>Header 2</h3>
  <p>Text...</p>
  <p>More Text...</p>
  <p>Even more Text...</p>
</div>
<div id='header_3'>
  <h3>Header 3</h3>
  <p>Some Text...</p>
</div>
0

精彩评论

暂无评论...
验证码 换一张
取 消