开发者

C# Regex.Replace Multiple Newlines

开发者 https://www.devze.com 2023-01-20 16:53 出处:网络
I have a text file that contains more or less paragraphs.The text is not actually words, its comma delimited data; but that\'s not really that important.The text file is sort of divided into sections;

I have a text file that contains more or less paragraphs. The text is not actually words, its comma delimited data; but that's not really that important. The text file is sort of divided into sections; there can be sections, and subsections. The division of sections is denoted by more than on开发者_C百科e newlines and subsections by a newline.

So sample data:

This is the, start of a, section
908690,246246246,246246
246246,246,246246

This is, the next, section,
sfhklj,sfhjk,4626246
4yw2,fdhds5juj,53ujj

So the above data contains two sections, each with three subsections. Sometimes however, there is more than one empty line between sections. When this occurs, I want to convert the multiple newline characters, say \n\n\n\n to just \n\n; I think regex is probably the way to do this. I also may need to use different newline standards, unix \n, and windows \r\n. I think the files probably contain multiple endline encodings.

Here is the regex that I've come up with; its nothing special:

Regex.Replace(input, @"([\r\n|\n]{2,})", Enviroment.NewLine + Enviroment.NewLine}

Firstly, is this a good regex solution? I'm not that good with regex.

Secondly, I then want to split each section into an element in a string array:

Regex.Split(input, Enviroment.NewLine + Enviroment.NewLine)

Is there a way to combine these steps?


[\r\n|\n] is wrong. That's a character class that matches one of the characters \r, \n, or |.

Common idioms for matching a generic line separator are (?:\r\n|[\r\n]) or (?:\n|\r\n?). These will match \r\n (DOS/Windows), \r (older Macintosh), or \n (Unix/Linux/Mac OS X).

I would normalize all line separators to \n, then split on two or more of those:

Regex.Split(Regex.Replace(source, @"(?:\r\n|[\r\n])", "\n"), @"\n{2,}")


I will just use String.Split and first split the text into sections using double newlines as delimiter, then split each of the section into subsection using single newline as delimiter. You will then end up with the array you wanted. You can use List<string> object as the container and add the array returned from the split method using AddRange to the container.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号