I am writing a Ruby script which automatically crawls websites for data analysis, and now I have a requirement which is fairly complicated: I have to be able to simulate access from a variety of countries, about 20 different ones. The website will contain different information depending on the IP location, so the only way to get it done is to request it from a ser开发者_Go百科ver which is actually in that country.
Since I don't want to buy servers in each of those 20 countries, I chose to give Tor a try - as many of you will know, by editing the torrc configuration file it is possible to specify the exit node and hence the country from which the actual request will originate.
When I do this manually, e.g. by editing the torrc file to use an Argentinian server, then disconnecting Tor using Vidalia, reconnecting Vidalia, and then rerunning the request, it works fine. However, I want to automate this process entirely, and do it as efficiently as possible. Tor is written in C, and I'd like to avoid taking apart its entire source code for this. Any idea of what's the easiest way to automate the whole process using only Ruby?
Also, if I'm missing something and there's a simpler alternative to this whole ordeal, let me know.
Thanks!
Please take a look at Tor control protocol. You can control circuits using telnet. http://thesprawl.org/memdump/?entry=8
To switch to a new circuit wich switches to a new endpoint:
require 'net/telnet'
def switch_endpoint
localhost = Net::Telnet::new("Host" => "localhost", "Port" => "9051", "Timeout" => 10, "Prompt" => /250 OK\n/)
localhost.cmd('AUTHENTICATE ""') { |c| print c; throw "Cannot authenticate to Tor" if c != "250 OK\n" }
localhost.cmd('signal NEWNYM') { |c| print c; throw "Cannot switch Tor to new route" if c != "250 OK\n" }
localhost.close
end
Be aware of the delay to make a new circuit, may take couple seconds, so you'd better add a delay in the code, or check if your address has changed by calling some remote IP detection site.
精彩评论