I'm trying to take an HTML document and group it into sections base on header tags using HTML Agility Here's what the raw HTML looks like
<h3>Header 1</h3>
<p>Text...</p>
<p>More Text...</p>
<h3Header 2</h3>
<p>Text...</p>
<p>More Text...</p>
<p>Even more Text...</p>
<h3>Header 3</h3>
<p>Some Text...</p>
and I want to have it end up something like this after I group it
<div id="header_1">
<h3>Header 1</h3>
<p>Text...</p>
<p>More Text...</p>
</div>
<div id="header_2">
<h3Header 2</h3>
<p>Text...</p>
<p>More Text...</p>
<p>Even more Text...</p>
</div>
<div id="header_3">
<h3>Header 3</h3>
<p>Some Text...</p>
</div>
or like this
<h3>Header 1</h3>
<div id="header_1">
<h3>Header 1</h3>
<p>Text...</p>
<p>More Text...</p>
</div>
<h3Header 2</h3>
<div id="header_2">
<p>Text...</p>
<p>More Text...</p>
<p>Even more Text...</p>
</div>
<h3>Header 3</h3>
<div id="header_3">
<p>Some Text...</p>
</div>
HTML Agility is great, but i开发者_StackOverflowf anyone knows another way to accomplish this, that would be awesome!
It's quite easy could be done with AgilityPack. First you need to get all the top <h3>
s, create a <div>
before (or after) each <h3>
, then iterate through the following siblings of the current <h3>
until the next <h3>
or end of siblings found, and finally move these nodes into newly created <div>
:
var h3s = doc.DocumentNode.SelectNodes("h3");
var idx = 1;
foreach (var h3 in h3s)
{
var div = HtmlNode.CreateNode(string.Format("<div id='header_{0}'></div>", idx++));
h3.ParentNode.InsertBefore(div, h3);
var group = new List<HtmlNode> { h3 };
for (var next = h3.NextSibling; next != null && next.Name != "h3"; next = next.NextSibling)
group.Add(next);
foreach (var item in group)
{
item.Remove();
div.AppendChild(item);
}
}
This will get you something like (I corrected <h3Header 2</h3>
from your source):
<div id='header_1'>
<h3>Header 1</h3>
<p>Text...</p>
<p>More Text...</p>
</div>
<div id='header_2'>
<h3>Header 2</h3>
<p>Text...</p>
<p>More Text...</p>
<p>Even more Text...</p>
</div>
<div id='header_3'>
<h3>Header 3</h3>
<p>Some Text...</p>
</div>
精彩评论