My Problem is for particular case occurring in my project.
In my Html document,
I want to
replace <td>
with <td class=”right”>
for all tds except first one in a <tr>
tag. (if there is <tr>
inside a <tr>
tag then that also needs to be handled).
If input is like:
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<tr>
output should be like:
<tr>
<td>1</td>
<td class=”right”>2</td>
<td class=”right”>3</td>
<tr>
I have tried..this code..
public static string tableFormat(string html) // Add extra attribute to td
{
int start = 0, end = 0, trstart = 0, trend = 0;
// html = CleanUpXHTML(html); // clean unnecessary p tags
while (html.Contains("<tr>"))
{
//start=end;
trstart = html.IndexOf("<tr>", end);
if (trstart == -1)
break;
trend = html.IndexOf("</tr>", trstart);
start = html.IndexOf("<td>", trstart);
end = html.IndexOf("</td>", trend);
while (end < trend)
{
start = html.IndexOf("<td>", end);
html = html.Insert(start + 3, " class=\"right\"");
end = html.IndexOf("</td>", trstart);
}
开发者_如何学C }
return html;
}
just call this function from main: Note:this code will work for valid html i.e xhtml
public static string TableFormat(string xhtml)
{
int start = 0, end = 0, trstart = 0, trend = 0;
while (trstart != -1)
{
//start=end;
trstart = xhtml.IndexOf("<tr>", end);
if (trstart == -1)
break;
trend = xhtml.IndexOf("</tr>", trstart);
start = xhtml.IndexOf("<td>", trstart);
end = xhtml.IndexOf("</td>", start);
while (end < trend)
{
//int trackTr = 0;
start = xhtml.IndexOf("<td>", end);
if (start > trend)
break;
xhtml = xhtml.Insert(start + 3, " class=\"right\"");
end = xhtml.IndexOf("</td>", start);
}
}
return (xhtml);
}
Have you stepped through this code and verified that it works as intended? HTML is very forgiving about things like tag case and whitespace, but your method is not; if the HTML isn't formatted very specifically, your method will likely fail. I'd take a look at that.
Also, you might want to build some more flexibility into it. It might work now (once you get the issue resolved), but if the source HTML ever changes, it may not in the future.
if there is inside a tag then that also needs to be handled
Handling nested structures like that is not possible with regex.
Regex is an extraordinarily poor tool for manipulating HTML. Do yourself a favour and grab yourself a proper parser instead and your code will be simpler and more reliable. eg. with HTML Agility Pack:
HtmlDocument doc= new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode td in doc.DocumentElement.SelectNodes("//tr/td[position()>1]"]) {
td.SetAttributeValue("class", "right");
}
Consider using a regular expression...
string pattern = @"(?<!(<tr>\s*))<td>";
string test = @"<tr>
<td>1</td>
<td>2</td>
<td>3</td>
</tr> ";
string result = Regex.Replace(test, pattern, "<td class=\"right\">", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Console.WriteLine("{0}", result);
This works with upper or lower case and any amount of whitespace betweent the <tr> and the <td>. Anything other than whitespace would cause this to fail.
精彩评论