Does anyone know of a small, fast, javascript emulator with DOM layer support? in either C/C++?
The problem: I need rudimentary support for javascript in a crawler application, and am wondering if there's any other options other than:
a) Integrating WebKit (headless) (slows down crawling tremendously). b) 开发者_StackOverflowIntegrating SpiderMonkey and writing the DOM layer myself (not looking forward to this option, not sure if its even worth it, speed wise).
Any other options?
Thanks!
Throw in my vote for WebKit (or some other existing code). Why bother reinventing the wheel, especially when the wheel is really fancy, complicated, has spent years in development.
If you really wanted, you could write some code that checks for javascript first, so you only pass off the jobs that need it. Then, write filters for common ad networks and analytics packages to ignore. If it were me though, I'd rather be consistent with how I am crawling.
Also, don't think that you only need rudimentary support, as there are some really funky websites out there that do a ton of DOM altering. If you expect your crawling to be reliable, be prepared to support what browsers support. The easiest way to do that is use the same code that the browsers are using.
Correction: V8 does not support DOM, just JavaScript, so not what you were looking for...
V8:
- http://code.google.com/apis/v8/intro.html
精彩评论