i have calculated the tf-idf values of terms of document 1 and document 2..now i dont know how to use these tf-idf values...basically i wa开发者_如何转开发nt to find similarity between two documents(in my case are webpages)..can any body tell how to implement cosine similarity, jaccard coefficient to find similarity...c# code would be appreciated..pls help...thanks
I recommend a visit to Apache Mahout. It provides a complete kit of tools for this. Even if you don't want to use them, you can get the answers to these questions by looking at existing implementations.
精彩评论