开发者

how to get Company contact page url

开发者 https://www.devze.com 2022-12-12 12:10 出处:网络
Hi i have csv file 开发者_JAVA技巧which contains company url list like this www.google.com,www.ibm.com.....

Hi i have csv file 开发者_JAVA技巧which contains company url list like this www.google.com,www.ibm.com.....

Here i want to get contactus or aboutus page url (example http://www.google.com/contact) for each url which is present in csv file i have one idea checking the links with the following patterns (contact us, about us, about, locations).

If you do not find any of those, flag the url and write it into a log file. If you find the pattern, just print the address (it is used for some other process)


I'd suggest using Beautiful Soup to parse the page. Another alternative would be to setup a HIT on Mechanical Turk.


scrapy is the best. The best thing about scrapy is that it is a open source. scrapy documentation

0

精彩评论

暂无评论...
验证码 换一张
取 消