开发者

Lucene. How to build a term-doc matrix

开发者 https://www.devze.com 2023-02-05 17:26 出处:网络
I need to build that matrix but I can\'t find a way to compute normalized tf-idf for each cell. The norma开发者_Python百科lization I would perform is cosine-normalization that is divide tf-idf (comp

I need to build that matrix but I can't find a way to compute normalized tf-idf for each cell. The norma开发者_Python百科lization I would perform is cosine-normalization that is divide tf-idf (computed using DefaultSimilarity ) per 1/sqrt(sumOfSquaredtf-idf in the column).

Does anyone know a way to perform that?

Thanks in advance

Antonio


One way, not using Lucene, is described in Sujit Pal's blog. Alternatively, you can build a Lucene index that has term vectors per field, iterate over terms to get idf, then iterate over term's documents to get tf.

0

精彩评论

暂无评论...
验证码 换一张
取 消