I need to build that matrix but I can\'t find a way to compute normalized tf-idf for each cell.
The norma开发者_Python百科lization I would perform is cosine-normalization that is divide tf-idf (comp
I need to build that matrix but I can't find a way to compute normalized tf-idf for each cell.
The norma
开发者_Python百科lization I would perform is cosine-normalization that is divide tf-idf (computed using DefaultSimilarity ) per 1/sqrt(sumOfSquaredtf-idf in the column).
Does anyone know a way to perform that?
Thanks in advance
Antonio
One way, not using Lucene, is described in Sujit Pal's blog. Alternatively, you can build a Lucene index that has term vectors per field, iterate over terms to get idf, then iterate over term's documents to get tf.
精彩评论