Is there a way to get the subject of a sentence using OpenNLP? I'm trying to identify the most important part of a users sentence. Generally, users will be submitting sentences to our "engine" and we want to know exactly what the core topic is of that sentence.
Currently we ar开发者_如何学Goe using openNlp to:
- Chunk the sentence
- Identify the noun-phrase, verbs, etc of the sentence
- Identify all "topics" of the sentence
- (NOT YET DONE!) Identify the "core topic" of the sentence
Please let me know if you have any bright ideas..
Dependency Parser
If you're interested in extracting grammatical relations such as what word or phrase is the subject of a sentence, you should really use a dependency parser. While OpenNLP does support phrase structure parsing, I don't think it does dependency parsing yet.
Opensource Software
Packages written in Java that support dependency parsing include:
- MaltParser
- MSTParser
- Stanford Parser (demo, see typed dependencies section)
- RelEx
Of these, the Stanford Parser is the most accurate. However, some configurations of the MaltParser can be insanely fast (Cer et al. 2010).
For the grammatical subject you'd need to rely on configurational information in the tree. If the parse looks something like (TOP (S (NP ----) (VP ----))) then you can take the NP as the subject; often, though not at all always, that will be the case. However only some sentences will have this configuration; one can easily imagine structures with subjects that are not in that position -- passive constructions, for example.
You're probably better off using MaltParser though.
精彩评论