I want to use Latent Semantic Analysis for a small app I'm building, but I don't want to build up the matrices myself. (Partly because the documents I have wouldn't make a very good training collection, because they're kinda short and heterogeneous, and partly because I just got a new computer and I'm finding it a bitch to install the linear algebra and such libraries I would need.)
Are there any "default"/pre-built LSA implementations available? For example, things I'm looking for include:
- Default U,S,V matrices (i.e., if D is a term-document matrix from some training set, then D = U S V^T is the singular value decomposition), so that given any query vector q, I can use these matrices to compute the LSA projection of q myself.
- Some black-box LSA algorithm that, given a query vector q, r开发者_JAVA技巧eturns the LSA projection of q.
You'd probably be interested in the Gensim framework for Python; notably, it has an example on building the appropriate matrices from English Wikipedia.
精彩评论