I am using jtidy parser to parse the web page. It is working, sort of:
InputStream in=new URL("http://www.medicinenet.com/alopecia_areata/article.htm").openStream();
Document doc= new Tidy().parseDOM(in, null);
String titleText=doc.getElementsByTagName("title").item(0).getFirstChild().getNodeValue();
It is working fine for <title>...</title>
, but the url which I passed, it contains title tag <TITLE>...</TITLE>
in capital letter. So开发者_开发百科 it is returning null.
How to read <TITLE>...</TITLE>
& <title>...</title>
in one statement using java code? Please help me.
Just check for null, then check uppercase
String titleText=doc.getElementsByTagName("title").item(0).getFirstChild().getNodeValue();
if (titleText == null) titleText=doc.getElementsByTagName("TITLE").item(0).getFirstChild().getNodeValue();
getElementsByTagName
is case sensitive, so this is the simplest option.
精彩评论