I developing a software in java and I want to get some text from a website. The problem is that it is shown in the browser and hidden when I got though code.
update: I am reading through InputStreamReader from a website the comments field is not shown it is also not shown in the source code of the page. When I open that page in the browser the comments field is there and publicly availab开发者_JAVA百科le.
update: The URL is http://www.alarabiya.net/articles/2011/07/20/158410.html
Exactly which comments are you not seeing? The following code gets the comments as far as I can tell:
URL url = new URL("http://www.alarabiya.net/articles/2011/07/20/158410.html");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.connect();
InputStream in = urlConnection.getInputStream();
byte[] data = new byte[8192];
int length;
while ((length = in.read(data)) != -1) {
System.out.print(new String(data, 0, length));
}
in.close();
urlConnection.disconnect();
Note: the above code isn't production grade--just an example.
Here is a blog post describing how to get HTML from a url using the Java SDK, or Apache Commons HttpClient. Once you get the HTML, there is lots you can do to it.
- Extract the Text from the Markup
- Extract Links
- Change Links
- Collect Email Addresses
- Collect Images
- Add Syntax Highlighting
- Diff Two Sources
READ HTML WITH JAVA – THEN 7 FUN THINGS TO DO TO IT
If you are building desktop applications, you can use XULRunner and inject Javascript to show the result. I have done a project working with mal-formated webpages. If you use jdom, you will get plenty of errors, but XULRunner is very good at handling theses pages.
An easier way to do the same thing is by using JavaScript Bookmarklets. For example: http://www.mattcutts.com/blog/javascript-bookmarklet-basics/
Embed your Javascript in the URL and send result via AJAX to your Java server.
精彩评论