Can you use Jsoup to submit a search to Google, but instead of sending your request via "Google Search" use "I'm Feeling Lucky"? I would like to capture the name of the site that woul开发者_高级运维d be returned.
I see lots of examples of submitting forms, but never a way to specify a specific button to perform the search or form submission.
If Jsoup won't work, what would?
According to the HTML source of http://google.com the "I am feeling lucky" button has a name of btnI
:
<input value="I'm Feeling Lucky" name="btnI" type="submit" onclick="..." />
So, just adding the btnI
parameter to the query string should do (the value doesn't matter):
http://www.google.com/search?hl=en&btnI=1&q=your+search+term
So, this Jsoup should do:
String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).get();
System.out.println(document.title());
However, this gave a 403 (Forbidden) error.
Exception in thread "main" java.io.IOException: 403 error loading URL http://www.google.com/search?hl=en&btnI=1&q=balusc
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at test.Test.main(Test.java:17)
Perhaps Google was sniffing the user agent and discovering it to be Java. So, I changed it:
String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
System.out.println(document.title());
This yields (as expected):
The BalusC Code
The 403 is however an indication that Google isn't necessarily happy with bots like that. You might get (temporarily) IP-banned when you do this too often.
I'd try HtmlUnit for navigating trough a site, and JSOUP for scraping
Yes it can, if you are able to figure out how Google search queries are made. But this is not allowed by Google, even if you would success with that. You should use their official API to make automated search queries.
http://code.google.com/intl/en-US/apis/customsearch/v1/overview.html
精彩评论