I wonder if YouTube could be searched with HtmlUnit. I started to write code, here it is:
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
public class HtmlUnitExampleTestBase {
private static final String YOUTUBE = "http://www.youtube.com";
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
//This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage("http://www.youtube.com");
//Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
//Printing result form
System.out.println(searchForm.asText());
final List<HtmlAnchor> listLinks = (List<HtmlAnchor>) newPage.getByXPath("//a[@class='ux-thumb-wrap result-item-thumb']");
for (int i=开发者_StackOverflow社区0; i<listLinks.size(); i++){
System.out.println(YOUTUBE + listLinks.get(i).getAttribute("href"));
}
}
}
Now I don't know how to type some text into a search field and press Search button.
I saw tutorials about HtmlUnit but I'm having a problem because they use a method named: getElementByName
but the search button on YouTube doesn't have a name, just an id. Could someone help me?
EDIT: I edited code above code and now I am getting youtube links from first page. But before that I need to sort by upload date and then to grab links. Can someone help me to do sorting?
I'm no HtmlUnit expert, but there is a workaround. You can add your own button to the form and use it to submit the form.
Here's a code sample with comments:
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlButton;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
public class HtmlUnitExampleTestBase {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
// This is equivalent to typing youtube.com to the adress bar of browser
HtmlPage currentPage = webClient.getPage("http://www.youtube.com");
// Get form where submit button is located
HtmlForm searchForm = (HtmlForm) currentPage.getElementById("masthead-search");
// Get the input field.
HtmlTextInput searchInput = (HtmlTextInput) currentPage.getElementById("masthead-search-term");
// Insert the search term.
searchInput.setText("Nyan Cat");
// Workaround: create a 'fake' button and add it to the form.
HtmlButton submitButton = (HtmlButton) currentPage.createElement("button");
submitButton.setAttribute("type", "submit");
searchForm.appendChild(submitButton);
// Workaround: use the reference to the button to submit the form.
HtmlPage newPage = submitButton.click();
System.out.println(newPage.asText());
}
}
HtmlUnit is OK, but I vastly prefer Watir or Selenium for web automation.
One of HtmlUnit's weaknesses is its lack of selector methods for getting at DOM elements in a jQuery-like way. Check out the css-selector project, which will add on to HtmlUnit to help you do what you need very easily. There's an intro at Gooder Code.
Once you get that working, the selector for the YouTube search form would be ".search-term" and the submit button's selector would be ".search-button"
精彩评论