开发者

Can Jsoup simulate a button press?

开发者 https://www.devze.com 2023-04-06 09:09 出处:网络
Can you use Jsoup to submit a search to Google, but instead of sending your request via \"Google Search\" use \"I\'m Feeling Lucky\"?I would like to capture the name of the site that woul开发者_高级运

Can you use Jsoup to submit a search to Google, but instead of sending your request via "Google Search" use "I'm Feeling Lucky"? I would like to capture the name of the site that woul开发者_高级运维d be returned.

I see lots of examples of submitting forms, but never a way to specify a specific button to perform the search or form submission.

If Jsoup won't work, what would?


According to the HTML source of http://google.com the "I am feeling lucky" button has a name of btnI:

<input value="I'm Feeling Lucky" name="btnI" type="submit" onclick="..." />

So, just adding the btnI parameter to the query string should do (the value doesn't matter):

http://www.google.com/search?hl=en&btnI=1&q=your+search+term

So, this Jsoup should do:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).get();
System.out.println(document.title());

However, this gave a 403 (Forbidden) error.

Exception in thread "main" java.io.IOException: 403 error loading URL http://www.google.com/search?hl=en&btnI=1&q=balusc
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
    at test.Test.main(Test.java:17)

Perhaps Google was sniffing the user agent and discovering it to be Java. So, I changed it:

String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
System.out.println(document.title());

This yields (as expected):

The BalusC Code

The 403 is however an indication that Google isn't necessarily happy with bots like that. You might get (temporarily) IP-banned when you do this too often.


I'd try HtmlUnit for navigating trough a site, and JSOUP for scraping


Yes it can, if you are able to figure out how Google search queries are made. But this is not allowed by Google, even if you would success with that. You should use their official API to make automated search queries.

http://code.google.com/intl/en-US/apis/customsearch/v1/overview.html

0

精彩评论

暂无评论...
验证码 换一张
取 消