For my application, I need to be able to record user selected text on a web page, and then revisit the web page with a web crawler and see if the text that the user highlighted has changed at a future time. The text that I'm tracking is product information.
Getting the text that the user has selected is easy enough with JavaScript, but getting a reliable way of recording where in the document a crawler should visit to check the information isn't.
One of the main problems is that the JavaScript Dom representation of the document is not the same as the raw html. I have tried creating regular expressions based on the text surrounding the user selected text, but开发者_运维知识库 for this reason this is unreliable.
XPath is another option, but is likely to suffer the same problems.
If anyone could suggest a technique I could use here, it would be very very much appreciated.
精彩评论