开发者

how do I search a word in a webpage

开发者 https://www.devze.com 2023-02-04 16:33 出处:网络
how do I search for existence of a word in a webpage given开发者_Go百科 its url say \"www.microsoft.com\". Do I need to download this webpage to perform this search ?You just need to make http request

how do I search for existence of a word in a webpage given开发者_Go百科 its url say "www.microsoft.com". Do I need to download this webpage to perform this search ?


You just need to make http request on web page and grab all its content after that you can search necessary words in it, below code might help you to do so.

 public static void main(String[] args) {
    try {
        URL url;
        URLConnection urlConnection;
        DataOutputStream outStream;
        DataInputStream inStream;

        // Build request body
        String body =
        "fName=" + URLEncoder.encode("Atli", "UTF-8") +
        "&lName=" + URLEncoder.encode("Þór", "UTF-8");

        // Create connection
        url = new URL("http://www.example.com");
        urlConnection = url.openConnection();
        ((HttpURLConnection)urlConnection).setRequestMethod("POST");
        urlConnection.setDoInput(true);
        urlConnection.setDoOutput(true);
        urlConnection.setUseCaches(false);
        urlConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
        urlConnection.setRequestProperty("Content-Length", ""+ body.length());

        // Create I/O streams
        outStream = new DataOutputStream(urlConnection.getOutputStream());
        inStream = new DataInputStream(urlConnection.getInputStream());

        // Send request
        outStream.writeBytes(body);
        outStream.flush();
        outStream.close();

        // Get Response
        // - For debugging purposes only!
        String buffer;
        while((buffer = inStream.readLine()) != null) {
            System.out.println(buffer);
        }

        // Close I/O streams
        inStream.close();
        outStream.close();
    }
    catch(Exception ex) {
        System.out.println("Exception cought:\n"+ ex.toString());
    }
}


i know how i would do this in theory - use cURL or some application to download it, store the contents into a variable, then parse it for whatever you need


Yes, you need to download page content and search inside it for what you want. And if it happens that you want to search the whole microsoft.com website then you should either write your own web crawler, use an existing crawler or use some search engine API like Google's.


Yes, you'll have to download the page, and, to make sure to get the complete content, you'll want to execute scripts and include dynamic content - just like a browser.

We can't "search" something on a remote resource, that is not controlled by us and no webservers offers a "scan my content" method by default.

Most probably you'll want to load the page with a browser engine (webkit or something else) and perform the search on the internal DOM structure of that engine.


If you want to do the search yourself, then obviously you have to download the page. If you're planning on this approach, i recommend Lucene (unless you want a simple substring search)

Or you could have a webservice that does it for you. You could request the webservice to grep the url and post back its results.


You could use a search engine's API. I believe Google and Bing (http://msdn.microsoft.com/en-us/library/dd251056.aspx) have ones you can use.

0

精彩评论

暂无评论...
验证码 换一张
取 消