SaxParser replacing text while downloading?_问答_开发者

SaxParser replacing text while downloading?

开发者 https://www.devze.com 2023-03-15 02:29 出处：网络

I have a Java SAXparser that downloads and parses, using parse(new InputSource(conn.getInputStream())). Unfortunately, sometimes it gives error when downloading a site\'s xml: \"XML or text declaratio

相关专题：saxparser

I have a Java SAXparser that downloads and parses, using parse(new InputSource(conn.getInputStream())). Unfortunately, sometimes it gives error when downloading a site's xml: "XML or text declaration not at start of entity" Apparently this is bad xml, declaration has to be first:

<!DOCTYPE ... stuff here ...>
<?xml  ... stuff here ...?>

Unfortunately, there doesn't seem to be any way to ignore this error. I suppose I could download the entire xml, then use regex or something to fix this, then parse it, but it seems this wouldn't have the benefit of parsing as i开发者_运维问答t's downloading? Is there a way to replace it while it's parsing?

Easy solution: read the first line from the stream, consuming those bytes, and then pass it to the parser.

Proper Java solution: create an intermediate stream interface that wraps any kind of stream and offers a SAX parser compatible stream in return. Then create a class implementing that interface specifically for your case.

That way, you can detect the problematic header before it ever reaches the SAX parser.

Edit: I would just use the Apache commons XML parser, or a DOM parser instead of SAX. Also, unless your XML is really long, there's not much difference in parsing it during or after the download.

Have a look at Jsoup. It can deal with wrongly formatted xml.