I'm looking for a general purpose API/web service/tool/etc... t开发者_如何学Gohat allows convert a given HTML page to an RDF graph as specific as possible (most probably using a back bone ontology and/or mapper).
Have you proved GRDDL?
GRDDL is a technique for obtaining RDF data from XML documents and in particular XHTML pages.
I used XQuery to extract the data out of the given set of web pages. I had to write custom queries for the web pages. I think this is the most straight forward approach to take for a specific set of HTML files. However, it is obviously not good for the general case. For a different set of web pages other custom queries are need to be written.
I used JSoup to scrape data from HTML. It uses jQuery style of querying HTML DOM, wich I was already famirial with, so it was realy simple tool to use for me. I also fund it quite robust but I needed it just to scrape 3 datasources so I dont have rich experience with this tool yet. jsoup
精彩评论