开发者

Split text into sentences in C# [closed]

开发者 https://www.devze.com 2023-02-09 13:21 出处:网络
Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_如何学运维Want to improve this question? Update the question so it focuses on one problem only
Closed. This question needs to be more focused. It is not currently accepting answers.
开发者_如何学运维

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 1 year ago.

Improve this question

I want to divide a text into sentences. A sentence ends with (dot) or ? or ! followed by one or more whitespace characters followed and the next sentence starts with an uppercase letter.

For example:

First sentence. Second sentence!

How can I do that?


You can split on a regular expression that matches white space, with a lookbehind that looks for the sentence terminators:

string[] sentences = Regex.Split(input, @"(?<=[\.!\?])\s+");

This will split on the white space characters and keep the terminators in the sentences.

Example:

string input = "First sentence. Second sentence! Third sentence? Yes.";
string[] sentences = Regex.Split(input, @"(?<=[\.!\?])\s+");

foreach (string sentence in sentences) {
  Console.WriteLine(sentence);
}

Output:

First sentence.
Second sentence!
Third sentence?
Yes.


What languages do you want to support? For example, in Thai there are no spaces between words and sentences are separated with space. So, in general, this task is very complex. Also consider the useful comment by Fredrik Mörk.

So, at first you need to define set of rules on what "sentence" is. Then you are welcome to use one of the suggested solutions.


Have you tried String.Split()? See the docs about it here


Try this (MSDN)

char[] separators = new char[] {'!', '.', '?'};
string[] sentences1 = "First sentence. Second sentence!".Split(separators);
//or...
string[] sentences2 = "First sentence. Second sentence!".Split('!', '.', '?');
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号