I need to parse a list of bookmarks exported from a browser like Chrome, Firefox and IE. Maybe even google etc.
I played around and did s开发者_如何学Pythonomething like this reMatchNoCase("(<h3)(.*?)(</dl>)",myfile1)
loop. Then I use reMatchNoCase("(<dt[>])(.*?)(</a>)",i)
within the h3
/dl
tags, and then a lot of cleanup, but its really not reliable.
The thing is that they have categories using h3
tags surrounded by dl
tags and then the bookmarks in that. I can't just parse all URLs since I want to get the categories as in the browser.
Thanks.
if it is XHTML, use XPath
if it is not, it wouldn't be easy. Search https://stackoverflow.com/search?q=parse+html
can you consider using a hybrid approach, parse with jQuery on client side first and post to CF?
精彩评论