Java - extract XML documents from String_问答_开发者

Java - extract XML documents from String

开发者 https://www.devze.com 2023-02-14 15:40 出处：网络

Having a random String, how to extract the XML document(s) from it ? Consider that the String might开发者_运维技巧 hold none (incomplete), one (complete), or multiple documents.

相关专题：parsing xml

Having a random String, how to extract the XML document(s) from it ?

Consider that the String might开发者_运维技巧 hold none (incomplete), one (complete), or multiple documents.

Is there a template / tool to solve this problem ?

LE: consider the case when XML data is retrieved via TCP/IP

Multiple documents is a challenge... I'd wrap the String into an additional "root", this would at least transform the content to a valid xml-document:

 String xml = "<my-own-root-element>" + getString() + "</my-own-root-element>";

Just a start. Of course, forget about xml schemas and doctype. Different character encodings may be a challenge and you may have to filter out the <?xml ... ?> processing instructions.

I know no existing solution that can handle broken XML documents automatically. XML is a very strict standard with little leeway when it comes to parse errors. You are on your own.

What you can try is looking at the code for XML editors; they must be able to handle corrupt documents but I doubt that any of them can handle things like missing start elements and such.

this is my C# version of it, hope it gives some direction... I'm using it for tcp/ip communication, and T stands for some generic type.

public List<T> ParseMultipleDocumentsByType<T>(string documents)
    {
        var cleanParsedDocuments = new List<T>();
        var stringContainsDocuments = true;
        while (stringContainsDocuments )
        {
            if(documents.Contains(typeof(T).Name))
            {
                var startingPoint = documents.IndexOf("<?xml");
                var endingString = "</" +typeof(T).Name + ">";
                var endingPoing = documents.IndexOf(endingString) + endingString.Length;
                var document = documents.Substring(startingPoint, endingPoing - startingPoint);
                var singleDoc = (T)XmlDeserializeFromString(document, typeof(T));
                cleanParsedDocuments.Add(singleDoc);
                documents = documents.Remove(startingPoint, endingPoing - startingPoint);
            }
            else
            {
                flag = false;
            }
        }


        return cleanParsedDocuments;
    }

    public static object XmlDeserializeFromString(string objectData, Type type)
    {
        var serializer = new XmlSerializer(type);
        object result;

        using (TextReader reader = new StringReader(objectData))
        {
            result = serializer.Deserialize(reader);
        }

        return result;
    }