My main problem is that I don't know how to extract nodes from GrammaticalStructure. I am using englishPCFG.ser in java netbeans. My target is to know the quality of the screen like:
The screen of iphone 4 is great.
I want to extract screen and great. How can I extract the NN (screen) and VP (great) ?
the code that I wrote is:
LexicalizedParser lp = new LexicalizedParser("C:\\englishPCFG.ser");
lp.setOptionF开发者_运维知识库lags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});
String sent ="the screen is very good.";
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
parse.pennPrint();
System.out.println();
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
The collection tdl
is a list of typed dependencies. For this sentence, it contains:
det(screen-2, the-1)
nsubj(great-7, screen-2)
amod(4-5, iphone-4)
prep_of(screen-2, 4-5)
cop(great-7, is-6)
(as you can see by trying it out online).
So, the dependency you want, nsubj(great-7, screen-2)
is right there in that list. nsubj
means that "screen" is the subject of "great".
The collection of dependencies is just a Collection (List). For doing more sophisticated further processing, people commonly want to make the dependencies into a graph structure that can be variously searched and traversed. There are various ways of doing that. We often use the (jgrapht)[http://www.jgrapht.org/] library. But that's then code you are writing yourself.
精彩评论