开发者

Break the string on Full Stop for (Chinese, Arabic, Japanese, Russian, Korean, Dutch, Hindi, Greek, Urdu) using javascript

开发者 https://www.devze.com 2023-01-21 19:09 出处:网络
I am working on languge segmentation project. I applied language segmentation for English by using regular expression breaking the string at . (\"Full Stop\"). Now i want to provide the support for fo

I am working on languge segmentation project. I applied language segmentation for English by using regular expression breaking the string at . ("Full Stop"). Now i want to provide the support for following languages (Chinese, Arabic, Japanese, Russian, Korean, Dutch, Hindi, Greek, Urdu). I want to break the above mentioned language strings on Full stop.

e.g.

For Chinese Full stop is 。 (Unicode value U+3002) String

以有效應對各種事態」。他還表示,希开发者_高级运维望以符合21世紀的方式切實深化美日同盟關係。

Expected Result

Segment 1 :- 以有效應對各種事態」。
Segment 2 :- 他還表示,希望以符合21世紀的方式切實深化美日同盟關係。

Same logic I have to apply for other languages (Arabic, Japanese, Russian, Korean, Dutch, Hindi, Greek, Urdu).


See String.split. You can use /([。])/ as a regular expression separator. Add the other punctuation characters inside the square brackets. The round parentheses will capture your delimiters.


In php you might use preg_split( REGEX , $yourString );

Replace the word REGEX with your regular expression. Possibly like @janmoesen mentioned.

0

精彩评论

暂无评论...
验证码 换一张
取 消