I have a situation where I need to visit 100 odd websites to collect contact information and then enter this in my own site. What I want to know is if its possi开发者_运维百科ble to write a program or a crawler, if I'm putting it correctly, to get all this information. I'm guessing the information will be available in unstructured html and then I'll have to do parsing to make it structured.Has anyone had any similar experience of doing this. Also would like opinions on the language to use.
You're looking for a Web Scraper. A few Google searches should turn up various free and commercial products that would solve your problem. You probably don't need to write one yourself if the data you're collecting is fairly simple and well structured.
Try ruby ( mechanize lib):
http://mechanize.rubyforge.org/mechanize/GUIDE_rdoc.html
as example:
agent.get('http://someurl.com/').search(".//p[@class='posted']")
精彩评论