开发者

Getting a web page with Sockets

开发者 https://www.devze.com 2023-01-29 08:12 出处:网络
I am currently working on learning socket programming and have run into an issue that I require help with. What I am attempting to do is to write a little Java class that will connect to a web host, d

I am currently working on learning socket programming and have run into an issue that I require help with. What I am attempting to do is to write a little Java class that will connect to a web host, download the default page, then disconnect from the host. I know that it is simpler to use URLConnection to do this, but I am trying to learn the Sockets classes. I have been successful to connect to a web server but I am having difficulty pulling in the page. This is what I have working (and not working) so far:

import java.io.*;
import java.net.*;
import java.lang.IllegalArgumentException;
public class SocketsFun{
    public static void main(String[] myArgs){
        // Set some variables开发者_如何学JAVA
        String theServer = null;
        String theLine = null;
        int thePort = 0;
        Socket theSocket = null;
        boolean exit = false;
        boolean socketCheck = false;
        BufferedReader theInput = null;

        // Grab the server and port number
        try{
            theServer = myArgs[0];
            thePort = Integer.parseInt(myArgs[1]);
            System.out.println("Opening a connection to " + theServer + " on port " + thePort);
        } catch(ArrayIndexOutOfBoundsException aioobe){
            System.out.println("usage: SocketsFun host port");
            exit = true;
        } catch(NumberFormatException nfe) {
            System.out.println("usage: SocketsFun host port");
            exit = true;
        }

        if(!exit){
            // Open the socket
            try{
                theSocket = new Socket(theServer, thePort);
            } catch(UnknownHostException uhe){
                System.out.println("* " + theServer + " does not exist");
            } catch(IOException ioe){
                System.out.println("* " + "Connection Refused");
            } catch(IllegalArgumentException iae){
                System.out.println("* " + thePort + " Not A Valid TCP/UDP Port.");
            }

            // Print out some stuff
            try{
                System.out.println("Connected Socket: " + theSocket.toString());
            } catch(Exception e){
                System.out.println("* " + "No Open Socket");
            }

            try{
                theInput = new BufferedReader(new InputStreamReader(theSocket.getInputStream()));
                while ((theLine = theInput.readLine()) != null){
                    System.out.println(theLine);
                }
                theInput.close();
            } catch(IOException ioe){
                System.out.println("* " + "No Data To Read");
            } catch(NullPointerException npe){
                System.out.println("* " + "No Data To Read");
            }

            // Close the socket
            try{
                socketCheck = theSocket.isConnected();
            } catch(NullPointerException npe){
                System.out.println("* " + "No Socket To Close");
            }
        }
    }
}

All I am wanting is for this class to spit out what might be output from "curl", "lynx -dump", or "wget", etc. Any and all help will be greatly appreciated.


You have the right idea, but you're not submitting a HTTP request. Send:

GET / HTTP/1.1\r\nHost: <hostname\r\n\r\n

This follows the format

[METHOD] [PATH] HTTP/1.1 [CRLF]
Host: [HOSTNAME] [CRLF]
OTHER: HEADERS [CRLF]
[CRLF]

You should get a response that follows a similar format - header, blank line, and data. Read about the HTTP protocol for more info.

EDIT Perhaps it'd help to get a feel for the HTTP request syntax, to start. It's pretty simple, and just a good thing to know generally. Open a terminal and use netcat (preferable) or telnet. netcat google.com 80 or telnet google.com 80. Type:

GET / HTTP/1.1[ENTER]
Host: google.com[ENTER]
[ENTER]

I get the response (folloowing the second return):

HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Thu, 09 Dec 2010 00:03:39 GMT
Expires: Sat, 08 Jan 2011 00:03:39 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block

<HTML&<HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

Once you get a feel for the request syntax, just write that to the socket, then read the lines until the server closes, like you're doing.


You need to write something to the socket's output stream. Web servers wait for a request from the client before sending anything: writing "GET" will ask the server to return the default page.

Your code doesn't write anything so the server will wait forever.

0

精彩评论

暂无评论...
验证码 换一张
取 消