Get page content from URL?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-26 06:27 出处：网络

I want to get content of page from URL by this code : public static String getContentResult(URL url) throws IOException{

相关专题：

I want to get content of page from URL by this code :

public static String getContentResult(URL url) throws IOException{

    InputStream in = url.openStream();
    StringBuffer sb = new StringBuffer();

    byte [] buffer = new byte[256];

    while(true){
        int byteRead = 开发者_Python百科in.read(buffer);
        if(byteRead == -1)
            break;
        for(int i = 0; i < byteRead; i++){
            sb.append((char)buffer[i]);
        }
    }
    return sb.toString();
}

But with this URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 i can't get Asbtract :Database management systems will continue to manage.....

Can you give me solution for solve problem ? Thanks in advance

Outputting the header of of the get request:

HTTP/1.1 302 Moved Temporarily
Connection: close
Date: Thu, 18 Nov 2010 15:35:24 GMT
Server: Microsoft-IIS/6.0
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE
Content-Type: text/html; charset=UTF-8

This means that the server wants you to download the new locations address. So either you get the header directly from the UrlConnection and follow that link or you use HttpClient automatically which automatically follow redirects. The code below is based on HttpClient:

public class HttpTest {
    public static void main(String... args) throws Exception {

        System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315")));
    }

    private static String readPage(URL url) throws Exception {

        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url.toURI());
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            StringBuffer sb = new StringBuffer();
            {
                int read;
                char[] cbuf = new char[1024];
                while ((read = reader.read(cbuf)) != -1)
                    sb.append(cbuf, 0, read);
            }

            return sb.toString();

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

There's no "Database management..." on given url. Perhaps, it's loaded by javascript dynamically. You'll need to have more sophisticated application to download such content ;)

The content you're looking for is not included in this URL. Open your browser and view the source code. Instead many javascript files are loaded. I think the content is fetched later by AJAX calls. You would need to learn how the content is loaded.

The Firfox Plugin Firebug could be helpful for a more detaild analyse.

The url that you should be using is:

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

Because the original url you posted (as mentioned by dacwe) sends redirect.

Get page content from URL?

精彩评论

关注公众号

热门标签

图文推荐

Get page content from URL?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：