We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 months ago.
Improve this questionI am asking a pretty hi开发者_如何学编程gh-level question here in order to hopefully get to know some of the pitfalls before setting out. I am planning an application that will visit specific web sites to collect, process and format tabular data. It must then somehow take certain web browser actions (follow a link, post a form, click a button etc) in response to the data that has been collected, giving feedback if something breaks in the process. A central requirement is that it must be easily adaptable to different pages, i.e. the data and menu options on the web pages are largely the same, but formatted differently. The format of the page can change without notice, so error detection and handling must be good.
I was thinking of going with C# and simply using the WebBrowser class in .NET, seeing as it at least has good facilities for manipulating the DOM and running JavaScript without any additional configuration. However, I am reasonably language agnostic. The major thing I am worried about is that it WebBrowser doesn't seem to be as tightly developed for actually performing actions (mouse clicks etc). I am wondering if this is going to bite me in the ass. Also, it is a plus if the program behaves indistinguishly from a human user when seen from the server side.
Has anyone here worked with these kinds of tasks? I have to emphasize that I am not doing testing of web applications here; this is more a robot. Are there any libraries/frameworks out there that are better suited than the .NET standard library with regards to flexibility and ease of use? Are there any major pitfalls to look out for?
I suggest you look at mechanize in combination with beautifulsoup it's perl or python but it's exactly what you need.
精彩评论