Is there a free available list of the most common english words to remove from text for creating a开发者_如何学C search index?
Wikipedia gives the 100 most frequent lemmas: http://en.wikipedia.org/wiki/Most_common_words_in_English
That might be good for a start; the article provides some good references.
Here are the ones (plus characters) used in SQL Server 05 noiseword list, i assume the 08 stopwords are simular.
And the MSDN on it here
Hope this helps
精彩评论