开发者

Natural Language Processing - Truecaser classifier

开发者 https://www.devze.com 2023-01-26 15:51 出处:网络
Please 开发者_开发知识库suggest a good machine learning classifier for truecasing of dataset.

Please 开发者_开发知识库suggest a good machine learning classifier for truecasing of dataset. Also, Is it possible to specify out own rules/features for truecasing in such a classifier? Thanks for all your suggestions.

Thanks


I implemented a version of a truecaser in Python. It can be trained for any language when you provide enough data (i.e. correctly cased sentences).

For English, it achieves an accuracy of 98.38% on sample sentences from Wikipedia. A pre-trained model for English is provided.

You can find it here: https://github.com/nreimers/truecaser


Please take a look at this whitepaper.

http://www.cs.cmu.edu/~llita/papers/lita.truecasing-acl2003.pdf

They report 98% of accuracy.

0

精彩评论

暂无评论...
验证码 换一张
取 消