How to get all the URLs of a website
Suppose I want to crawl some part of data in a website which in different web pages how to get all the url's list to get into all those similar pages.
suppose in a mobiles website I want to get all mobiles of one brand how c开发者_高级运维an I get them which are in different URL's of the site. I observe the Div tag class is "brand name" for all the mobiles
Div Class"Nokia" .... I want the URLs of the website which have div class as nokia.
You could use a HTML parser such as Html Agility Pack to extract all urls from anchors, forms, ... If the url is not part of the HTML you are parsing you won't be able (other than guessing) know what all the possible subdomains and urls exist for a given domain.
精彩评论