开发者

Reading potentially malformed Xml asychronously

开发者 https://www.devze.com 2023-01-10 07:04 出处:网络
It\'s Friday and my mind already seems to have moved to weekend thinking. Given this xml structure - <?xml version=\"1.0\" encoding=\"utf-8\"?>

It's Friday and my mind already seems to have moved to weekend thinking.

Given this xml structure -

<?xml version="1.0" encoding="utf-8"?>
<results requiredAttribute="somedatahere">
  <entry>
    <!-- Xml structure in here -->
  </entry>
  <entry>
    <!-- Xml structure in here -->
  </entry>
  <entry>
    <!-- Xml structure in here -->
  </entry>
</results>

And this code(cut down to the core code) that uses an xmlreader to read the data and asychronously return the data -

            response = (HttpWebResponse)request.GetResponse();

            using (var reader = XmlReader.Create(response.GetResponseStream()))
            {
                Logger.Info("Collector: Before attempt to read data for {0}", url);

                while (reader.Read())
                {
                    if (reader.NodeType == XmlNodeType.Element && reader.Name == "entry")
                    {
                        var el = XElement.ReadFrom(reader) as XElement;
                        if (el != null)
                            yield return el;
                    }
                }
            }

What is the easiest way to retrieve the value from the attribute requiredAttribute?

Key point to consider is that I don't at any point want to read the full xml file in as the file could be very big. Also the data is coming from an HttpStream so you开发者_如何学JAVA can't always guarantee that the data is complete and subsequently that the outer result element is well formed. This seems to rule out reading the result element and then iterating through it's children.


Stick with a purely XmlReader based approach, until it hits the malformation it will give you parsed content.

Any other approach (XPathDocument, XElement, XmlDocument) will try to parse the whole document first, so you will just get the applicable exception.


if (reader.NodeType == XmlNodeType.Element)
{
    if (reader.Name == "results")
    {
        if (reader.MoveToAttribute("requiredAttribute") && reader.ReadAttributeValue())
            yield return reader.Value;
    }
    if (reader.Name == "entry")
    {
        ...
    }
}

Test Program

using System;
using System.Collections.Generic;
using System.IO;
using System.Xml;

class Program
{
    static void Main(string[] args)
    {
        try
        {
            foreach (object value in Read())
                Console.WriteLine(value);
        }
        catch (XmlException ex)
        {
            Console.WriteLine(ex.Message);
        }
    }

    static IEnumerable<object> Read()
    {
        using (var file = File.OpenRead("Test.xml"))
        {
            var reader = XmlReader.Create(file, new XmlReaderSettings { IgnoreComments = true });
            while (reader.Read())
            {
                if (reader.NodeType == XmlNodeType.Element)
                {
                    yield return reader.Name;

                    if (reader.Name == "results")
                    {
                        if (reader.MoveToAttribute("requiredAttribute") && reader.ReadAttributeValue())
                            yield return reader.Value;
                    }
                }
            }
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消