Add a language in the Stanford parser_问答_开发者

开发者 https://www.devze.com 2023-04-09 02:00 出处：网络

I would like to use the Stanford parser in another language not already implemented. I looked on the website but found nothing that could help开发者_开发问答 me with that.

I would like to use the Stanford parser in another language not already implemented.

I looked on the website but found nothing that could help开发者_开发问答 me with that.

I guess what I have to do is "just" create a new languagePCFG.ser but to do that?

Also, if anyone knows if French and Spanish are supposed to be released?

Several things are needed:

You need a treebank (set of hand-parsed trees) from which the probabilities used in the parser are calculated
You need language-specific files (like xLanguagePack, xTreebankParserParams, which specify things about the language, treebank encoding, and parsing options
You then train the parser on the treebank to produce the grammar file (see makeSerialized.csh in the distribution)
You might need a language-specific tokenizer to divide text into tokens
If you want Stanford Dependencies output, then there is also a rule-based layer that defines the dependencies

Starting in 2011, we did start distributing a French model with the Stanford Parser. And starting in 2015, we have begun distributing a Spanish model.