I would like to use the Stanford parser in another language not already implemented.
I looked on the website but found nothing that could help开发者_开发问答 me with that.
I guess what I have to do is "just" create a new languagePCFG.ser but to do that?
Also, if anyone knows if French and Spanish are supposed to be released?
Several things are needed:
- You need a treebank (set of hand-parsed trees) from which the probabilities used in the parser are calculated
- You need language-specific files (like xLanguagePack, xTreebankParserParams, which specify things about the language, treebank encoding, and parsing options
- You then train the parser on the treebank to produce the grammar file (see makeSerialized.csh in the distribution)
- You might need a language-specific tokenizer to divide text into tokens
- If you want Stanford Dependencies output, then there is also a rule-based layer that defines the dependencies
Starting in 2011, we did start distributing a French model with the Stanford Parser. And starting in 2015, we have begun distributing a Spanish model.
精彩评论