Can someone recommend an open source POS tagger for Korean, Indonesian, Thai and Vietnamese?
That I can use to tag the corpus dat开发者_开发问答a that I currently have. (e.g. the stanford-postagger)
If you are a dev and care to share and let me test out the POS tagger, I don't mind either.
With some modifications of the output, I've POS tagged the Vietnamese data with jvntextpro
But I'd still like more input on Korean, Indonesian and Thai POS tagging.
After acl wiki: Korean morphological analyzer and part-of-speech tagger
I would start to look on the websites of NLP research departments in Korea, Thailand, and Korean. On this page, you will find links to the research departments.
Good luck!
UPDATE: OpenNLP has thai PoS. Here are the models: http://opennlp.sourceforge.net/models/thai/ for PoS opennlp tagger.
You might want to try RDRPOSTagger: a robust, easy-to-use and language-independent toolkit for POS and morphological tagging.
(Programming language: Python & Java)
RDRPOSTagger obtains fast performance in both learning and tagging process. In addition, RDRPOSTagger achieves a very competitive accuracy in comparison to the state-of-the-art results. See experimental results including performance speed and tagging accuracy in this paper.
RDRPOSTagger now supports pre-trained POS and morphological tagging models for 13 languages, including Thai and Vietnamese.
精彩评论