I am working on extraction of keywords. The system ta开发者_JS百科kes a URL as input and the output is supposed to be keywords describing the contents of the URL. We are considering only textual parts now. I would like to know what methods I can employ for extracting keywords from URLs and how they compare with each other. Suggestions and redirections are welcome.
i think you can use this method
read the site with urllib ( http://docs.python.org/library/urllib2.html?highlight=urllib2#module-urllib2 ) and then remove tags and create plane text of site
then check which word are used more. then create top tens ( or count )
精彩评论