I'm working on a web scraping application and was testing it with ebay. The thing is that the application should follow the link "Next" (the one at the bottom of the page that should go to the next page of results) but it kinda stays on the same page (yea, i'm actually not sure about that). If you try to open ebay and search for any term that will give a result with multiple pages, and then either copy the link of "Next" and paste it on 开发者_运维百科a new window or right click it and select open in a new tab/window, it will stay on the same page. I tested it on Chrome and IE8. So my question is what are these browsers doing when they actually follow the link (when I just click on it) so that I can do the same with my scraping application? (Oh, and by the way I'm working on C#)
In the case of eBay it is just a normal link (at least on http://www.ebay.com, look for page 2 of TV's) so the problem is probably with your code (are you storing cookies for instance?). From your description it sounds that it's an AJAX request, which would go "under the hood" and gets XML from the server which is rendered by JavaScript on the client side.
Traditionally, AJAX requests are hard to follow. In the case of ebay, however, I'd suggest use the interface that ebay has to query for information. If you are building a generalized web crawler, then stay away from the AJAX requests. Google doesn't bother either, most of the time.
I did a element.InvokeMember("click");
(where element
is an HtmlElement
) and it worked. Not sure why though. I'll take a look at that HTTP GET thing anyway.
精彩评论