What errors can I expect to fix HtmlAgility library? I know from my own experience it can close a missing tag, like:
<car>Nissan</car
When do Load or LoadHtml, it will fix it, like:
<car>Nissan</car>
I also know that ParseErorrs collection can determine Reas开发者_如何转开发on, Stream etc.
Is there a list of errors (or can you tell from your own experience) how reliable is HtmlAgility for fixing errors and what errors cannot be fixed by HtmlAgility?
Historically, Html Agility Pack was never designed to fix Html, but rather to be able to load, modify & save it back, even if this Html has errors.
It means it will fix errors that in general are fixed automatically by browsers, like the one you show in your question. The list of errors has been determined experimentally, and you can browse the source for a deep insight about it. That being said, it was actually designed back in 2000/2001 years so things may have changed in that area :-)
The ParseErrors collection will contain HtmlParseError objects with a code. The code is an enum that's documented:
/// A tag was not closed.
TagNotClosed,
/// A tag was not opened.
TagNotOpened,
/// There is a charset mismatch between stream and declared (META) encoding.
CharsetMismatch,
/// An end tag was not required.
EndTagNotRequired,
/// An end tag is invalid at this position.
EndTagInvalidHere
There is also an OptionFixNestedTags
property on HtmlDocument (default value is false), that is capable of fixing LI, TR, TH, TD tags when nesting errors are detected. It means if it detects a closing TR without all the needed closing TD, they will be closed automatically. Again, this is exactly what browser will do with malformed Html.
精彩评论