开发者

How can I add a carriage return in a text using regex?

开发者 https://www.devze.com 2022-12-10 04:59 出处:网络
I have a text file with multiple lines. I\'ll try to set a pattern to add a new carriage return in some lines of the text. This lines are like that:

I have a text file with multiple lines. I'll try to set a pattern to add a new carriage return in some lines of the text. This lines are like that:

lorem ipsum.

dolor sit amet, consectetur adipiscing elit [FIS] Donec feugiat

Well, the pattern is a line followed by other which has some characters and a '[' character too. If '[' is not pr开发者_如何学Cesent the pattern fails and the carriage return hasn't be added.

How can I do it using regular expressions??

I'm using C# as programming language and regex engine too.


If you want to add a line break after a . then you just replace it with itself and a line break. To make sure it is the last character, use a lookahead to check it is followed by whitespace, i.e. (?=\s)


So, to replace with newline character (recommended for most situations):

replace( input , '\.(?=\s)' , '\.\n' )


If you must use carriage return (and there are very few places that require it, even on Windows), you can simply add one:

replace( input , '\.(?=\s)' , '\.\r\n' )


If you want to ensure that a . is always followed by two line breaks, and not cause extra line breaks if they are already want, then it gets a little more complex, and required a negative lookahead, but looks like this:

replace( input , '\.(?!\S)(?:\r?\n){0,2}' , '\.\r\n\r\n' )

Because regex engines default to greedy, the {0,2} will try to match twice, then once, then zero times - at which point the negative lookahead for a non-space makes sure it is actually the end of a word.

(If you might have more than two newlines and want to reduce to two, you can just use {0,} instead, which has * as a shortcut notation.)


It's probably worth pointing out that none of the above will consume any spaces/tabs - if this is desired the lookaheads can either be changed from (?=\s) to \s+, you could can do a second replace of \n[ \t]+ with \n to remove any leading spaces/tabs, or something similar, depending on exactly what you're trying to do.


I believe you can use \r for carriage return and \n for new line


What flavor? Here it's done for C#:

string yourString = @"el tiempo.
campo vectorial vector field. [FIS] Campo ";
string newString = Regex.Replace(yourString, "el tiempo.", "$0\r\n");  // just \n may be sufficient though

EDIT: the above is an answer to the original question. After the excellent answer by Peter Boughton, I don't need to add much. Well, perhaps this, a little regex without look-around assertions, will simply replace all dots followed by one or more newlines with two newlines.

string newString = Regex.Replace(yourString, @"\.(\r|\n)+", ".\r\n\r\n");
0

精彩评论

暂无评论...
验证码 换一张
取 消