开发者

XDocument : is it possible to force the load of a malformed XML file?

开发者 https://www.devze.com 2023-02-26 03:12 出处:网络
I have a malformed XML file. The roottag is not closed by atag. The final tagis missing. When I try to load my malformed XML file in C#

I have a malformed XML file. The root tag is not closed by a tag. The final tag is missing.

When I try to load my malformed XML file in C#

StreamReader sr = new StreamReader(path);
batchFile = XDocument.Load(sr); // Exception

I get an exception "Unexpected end of file has occurred. The following elements are not closed: batch. Line 54, position 1."

Is it possible to ignore the close tag or to force the loading? I noticed that all my XML tools ((like XML notepad) ) automaticly fix or ignore the problem. 开发者_StackOverflow社区I can not fix the XML file. This one copme from a third party software and sometimes the file is correct.


You cant do it with XDocument because this class loads all document in memory and parse it completly.
But its possible to process document with XmlReader it would get you to read and process complete document and at the end youll get missing tag exeption.


I suggest using Tidy.NET to cleanup messy input

Tidy.NET has a nice API to get a list of problems (MessageCollection) in your 'XML' and you can use it to fix the text stream in memory. The simplest thing would be to fix one error at a time, thought that will not perform too well with many errors. Otherwise, you might fix errors in reverse document order so that the offsets of messages stay valid while doing the fixes

Here is an example to convert HTML input into XHTML:

Tidy tidy = new Tidy();

/* Set the options you want */
tidy.Options.DocType = DocType.Strict;
tidy.Options.DropFontTags = true;
tidy.Options.LogicalEmphasis = true;
tidy.Options.Xhtml = true;
tidy.Options.XmlOut = true;
tidy.Options.MakeClean = true;
tidy.Options.TidyMark = false;

/* Declare the parameters that is needed */
TidyMessageCollection tmc = new TidyMessageCollection();
MemoryStream input = new MemoryStream();
MemoryStream output = new MemoryStream();

byte[] byteArray = Encoding.UTF8.GetBytes("Put your HTML here...");
input.Write(byteArray, 0 , byteArray.Length);
input.Position = 0;
tidy.Parse(input, output, tmc);

string result = Encoding.UTF8.GetString(output.ToArray());


What you could do is add the closing tag to the xml in memory and then load it.

So after loading the xml into the streamreader, manipulate the data before you do the xml load

0

精彩评论

暂无评论...
验证码 换一张
取 消