I am trying to parse some HTML using NekoHTML.
The problem is that when the below code snippet is executed on the SUN JDK 1.5.0_01
it works fine (this is when i am using eclipse with sun jre). But when the same thing is executed on IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled)
then it is not working (this is when i am using the IBM RAD for development).
NodeList tags = doc.getElementsByTagName("td");
for (int i = 0; i < tag开发者_如何学编程s.getLength(); i++)
{
Element elem = (Element) tags.item(i);
// do something with elem
}
By working fine I mean that I am getting a list of "td" elements which I can process further. In case of the J9 I am not entering the for
loop.
I am using latest version of NekoHTML (along with the bundled Xerces jars). The doc
in the above code is of type org.w3.dom.Document
(the runtime class used is org.apache.html.dom.HTMLDocumentImpl
)
The IBM J9 details are as follows:
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (ifix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled)
J9VM - 20070322_12058_lHdSMR
JIT - 20070109_1805ifx3_r8
GC - WASIFIX_2007)
JCL - 20070131
Any idea, suggestion or workaround is appreciated. Thanks.
I have 2 ideas.
- I have just verified that xerces is a part of the JRE installation, so I believe it arrives to the classpath of your application from there. Probably SUN and IBM bring you different versions of xerces. So, as a first approach check it and probably try to replace what you have under IBM to the SUN's version. If it helps you have 2 options: continue running IBM java with xerces from SUN or continue to investigate what's wrong with xerces from IBM.
- Are there other differences between your dev and production environments? Are these the same operating systems? Is it a chance that you are using (for example) windows for development and unix for production but your xml is written on Windows with \r\n as a new line? Or even more: if your XML contains unicode characters and written in windows it can contain special (invisible) prefix that indicates that this is unicode. This prefix may cause parser to fail.
精彩评论