开发者

Add a language in the Stanford parser

开发者 https://www.devze.com 2023-04-09 02:00 出处:网络
I would like to use the Stanford parser in another language not already implemented. I looked on the website but found nothing that could help开发者_开发问答 me with that.

I would like to use the Stanford parser in another language not already implemented.

I looked on the website but found nothing that could help开发者_开发问答 me with that.

I guess what I have to do is "just" create a new languagePCFG.ser but to do that?

Also, if anyone knows if French and Spanish are supposed to be released?


Several things are needed:

  • You need a treebank (set of hand-parsed trees) from which the probabilities used in the parser are calculated
  • You need language-specific files (like xLanguagePack, xTreebankParserParams, which specify things about the language, treebank encoding, and parsing options
  • You then train the parser on the treebank to produce the grammar file (see makeSerialized.csh in the distribution)
  • You might need a language-specific tokenizer to divide text into tokens
  • If you want Stanford Dependencies output, then there is also a rule-based layer that defines the dependencies

Starting in 2011, we did start distributing a French model with the Stanford Parser. And starting in 2015, we have begun distributing a Spanish model.

0

精彩评论

暂无评论...
验证码 换一张
取 消