Since big web applications came into existence, searching for data (and doing it lightning fast and accurate) has been one of the most important problems in web applications. For a while, I've worked using Lucene.NET, which is a C# port of the Lucene project.
I also work using PHP using Zend Framework开发者_C百科's Lucene API, which brings me to my question. Most times for providing good indexing we need to perform some NLP tools like tokenizing, lemmatizing, and many more, the question is:
Do you know of any good NLP programming framework/toolset using PHP?
PS: I'm very aware of the Zend API for Lucene, but indexing data properly is not just storing and relying in Lucene, you need to perform some extra tasks, like those above.
I would suggest that you look at Solr, which is a best practice implementation of Lucene. Solr uses a REST based API that also has a very good PHP client. This will allow you to leverage the power of Lucene without needing to perform any of the low level programming to get the NLP power that you want. Also, you would probably want to grab the trunk version of Solr as the NLP development is very active right now and new capabilities are being added every day.
Zend has a full port of lucene to PHP. See docs here.
- Lucene has tokenizers
- Lucene has a porter stemmer
- Lucene has snowball
- Lucene can tie in with wordnet
Seems like you are looking for the same stuff i googled a few months back :D... I'm running a php/zend based project with Solr (via php-solr-client lib), and so far I havent found anything in php for advanced NLP. For basic stuff, as everyone mentions, you can get away with Solr (stemming, tag clouds / phrase tag clouds, tokenizing, etc), and there are a few basic but useful text processing php libraries out there (nothing fancy really, better rely on Solr itself)... but if you are looking for more algorithmic/semantic/sentiment NLP analysis I suggest you move a bit from PHP and get into Java, as there are more libraries that can help you in this area(such as OpenNLP). In case te adavanced stuff is what you are looking for, you probably might want to take a look at Mahout:
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
精彩评论