I have a crawling program that fetches urls to parse the html and came across an unusual error since I started this. For a specific set of urls from a site when fetching using HTTPWebRequest and HTTPWebResponse I get the error
**> The remote server returned an error:
(404) Not Found**
This is unusual since it works when pasting it in my browser. Any ideas appreciated. Not sure if code is needed to posted but let m开发者_运维技巧e know if so.
The site could be blocking your user-agent, or it could require cookies.
Could it be that the remote server is serving different pages depending on the User-Agent, and that it doesn't have a page that corresponds to the User-Agent value provided by the HttpWebRequest instance (empty by default)? Just a thought, since you say that the page can be found when navigating to its address with the browser but not through code.
精彩评论