开发者

Is there any good open-source or freely available Chinese segmentation algorithm available? [closed]

开发者 https://www.devze.com 2023-03-01 19:25 出处:网络
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Closed 8 years ago.

Improve this quest开发者_开发问答ion

As phrased in the question, I'm looking for a free and/or open-source text-segmentation algorithm for Chinese, I do understand it is a very difficult task to solve, as there are many ambiguities involed. I know there's google's API, but well it is rather a black-box, i.e. not many information of what it is doing are passing through.


The keyword text-segmentation for Chinese should be 中文分词 in Chinese.

Good and active open-source text-segmentation algorithm :

  1. 盘古分词(Pan Gu Segment) : C#, Snapshot
  2. ik-analyzer : Java
  3. ICTCLAS : C/C++, Java, C#, Demo
  4. NlpBamboo : C, PHP, PostgreSQL
  5. HTTPCWS : based on ICTCLAS, Demo
  6. mmseg4j : Java
  7. fudannlp : Java, Demo
  8. smallseg : Python, Java, Demo
  9. nseg : NodeJS
  10. mini-segmenter: python

Other

  1. Google Code : http://code.google.com/query/#q=中文分词
  2. OSChina (Open Source China)

Sample

  1. Google Chrome (Chromium) : src, cc_cedict.txt (73,145 Chinese words/pharases)

    • In text field or textarea of Google Chrome with Chinese sentences, press Ctrl+ or Ctrl+

    • Double click on 中文分词指的是将一个汉字序列切分成一个一个单独的词


Stanford segment using CRF algorithmn.

It's under GPL

link page is : http://nlp.stanford.edu/software/segmenter.shtml


ICU has details on universal text segmentation - http://userguide.icu-project.org/boundaryanalysis


Cursory Googling for "text segmentation chinese open source" reveals this library, which may or may not be what you're looking for...:

http://sourceforge.net/projects/ktdictseg/

The results hint at a few alternative venues to look for an open-source library, too:

  • Searching for an open-source search implementation that might work with Chinese.
  • Searching for an open-source plagiarism detection implementation that might with Chinese.
0

精彩评论

暂无评论...
验证码 换一张
取 消