开发者

Extracting semantic data from webpages

开发者 https://www.devze.com 2023-02-11 01:53 出处:网络
I\'m interested in extracting semantic data (simple template stuff) from webpages and other sources that aren\'t currently semanticly aware.I\'ve written crawlers and manual parser before in a bunch o

I'm interested in extracting semantic data (simple template stuff) from webpages and other sources that aren't currently semanticly aware. I've written crawlers and manual parser before in a bunch of different languages, but there always seems to be a lot of boilerplate and page specific code, and was wondering if you guys knew of any platforms or frameworks that simplified the process (open source only please).

I'll be writing one if I can't find one, so links to s开发者_Python百科imilar systems or framework suggestions would also be appreciated.


The field is known as "automatic wrapper extraction" and is an active area of research, but I haven't seen a good open source toolkit. A company called lixto makes a commercial tool that may be of interest to you. I'd love to see an open source project that tackles this problem.

0

精彩评论

暂无评论...
验证码 换一张
取 消