开发者

How do I get tags/keywords from a webpage/feed?

开发者 https://www.devze.com 2023-01-04 17:00 出处:网络
I have to build a tag cloud out of a webpage/feed. Once you get the word frequency table of tags, it\'s easy to build the tagcloud. But my doubt is how do I retrieve the tags/keywords from the webpage

I have to build a tag cloud out of a webpage/feed. Once you get the word frequency table of tags, it's easy to build the tagcloud. But my doubt is how do I retrieve the tags/keywords from the webpage/feed?

This is what I'm doing now:

Get the content -> strip HTML -> split them with \s\n\t(space,newline,开发者_运维百科tab) -> Keyword list

But this does not work great.

Is there a better way?


What you have is a rough 1st order approximation. I think if you then go back through the data and search for frequency of 2-word phrases, then 3 word phrases, up till the total number of words that can be considered a tag, you'll get a better representation of keyword frequency.

You can refine this rough search pattern by specifying certain words that can be contained as part of a phrase (pronouns ect).

0

精彩评论

暂无评论...
验证码 换一张
取 消