I'm writing a web scraper in Java but I'm behind a proxy server and it's making things very difficult.
This is the connection code:
public void scrape(String url, String filename) throws Exception {
this.url = url;
this.filename = filename;
System.out.println("Scraping " + url);
System.out.println("Saving to \"" + this.filename + "\"");
try {
makeConnection();
createStream();
writeToFile();
System.out.println("Scrape was successful");
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
private void makeConnection() throws Exception {
// Set proxy info
System.setProperty("java.net.useSystemProxies", "true");
URL address = new URL(url);
connection = address.open开发者_JAVA技巧Connection();
}
This is the output:
Scraping http://feeds.bbci.co.uk/news/northern_ireland/rss.xml
Saving to "../rss/northern_ireland.xml"
Error: Connection timed out
Is there a better way of setting the proxy settings?
You can use the java.net.Proxy class introduced in Java 1.5... http://download.oracle.com/javase/1.5.0/docs/api/java/net/Proxy.html
A brief writeup of how it is used can be found here: http://download.oracle.com/javase/6/docs/technotes/guides/net/proxies.html
Maybe the system's proxy settings aren't configured as you'd expected. Try explicitly setting the JVM system properties http.proxyPort
, http.proxyHost
, and http.nonProxyHosts
.
精彩评论