开发者

Java Parser for Natural Language

开发者 https://www.devze.com 2023-01-18 01:07 出处:网络
I am looking for a parser (or generated parser) in java that is capable of followings: I will provide sentences that are already part-of-speech tagged. I will use my own tag set.

I am looking for a parser (or generated parser) in java that is capable of followings:

  1. I will provide sentences that are already part-of-speech tagged. I will use my own tag set.
  2. I don't have any statistical data. So if the parser is statistical, I want to be able to use it without this feature.
  3. Adaptable t开发者_如何学运维o other languages easily. Low learning curve


The Stanford Parser (which was listed on that other SO question) will do everything you list.

You can provide your own POS tags, but you will need to do some translation to the Penn TreeBank set if they are not already in that format. Parsers are either statistical or they're not. If they're not, you need a set of grammar rules. No parsers are really built this way anymore, except as toys, because they are really Bad™. So, you can rely on the statistical data the Stanford Parser uses (with no additional work from you). This does mean, however, that statistics about your own tags (if they don't map directly to the Penn TreeBank tags) will be ignored. But since you don't have statistics for your tags anyway, that should be expected.

They have parsers trained for several other languages too, but you will need your own tagged data if you want to go to a language they don't have available. There's no getting around that, no matter which parser you use.

If you know Java (and I assume you do), the Stanford Parser is very straightforward and easy to get going. Also their mailing list is a great resource and is fairly active.


I'm not very clear on what you'd want, but the first thing I thought of was Mallet:

http://mallet.cs.umass.edu/index.php

0

精彩评论

暂无评论...
验证码 换一张
取 消