I have a text file with multiple lines. I'll try to set a pattern to add a new carriage return in some lines of the text. This lines are like that:
lorem ipsum.
dolor sit amet, consectetur adipiscing elit [FIS] Donec feugiatWell, the pattern is a line followed by other which has some characters and a '[' character too. If '[' is not pr开发者_如何学Cesent the pattern fails and the carriage return hasn't be added.
How can I do it using regular expressions??
I'm using C# as programming language and regex engine too.
If you want to add a line break after a . then you just replace it with itself and a line break. To make sure it is the last character, use a lookahead to check it is followed by whitespace, i.e. (?=\s)
So, to replace with newline character (recommended for most situations):
replace( input , '\.(?=\s)' , '\.\n' )
If you must use carriage return (and there are very few places that require it, even on Windows), you can simply add one:
replace( input , '\.(?=\s)' , '\.\r\n' )
If you want to ensure that a . is always followed by two line breaks, and not cause extra line breaks if they are already want, then it gets a little more complex, and required a negative lookahead, but looks like this:
replace( input , '\.(?!\S)(?:\r?\n){0,2}' , '\.\r\n\r\n' )
Because regex engines default to greedy, the {0,2}
will try to match twice, then once, then zero times - at which point the negative lookahead for a non-space makes sure it is actually the end of a word.
(If you might have more than two newlines and want to reduce to two, you can just use {0,}
instead, which has *
as a shortcut notation.)
It's probably worth pointing out that none of the above will consume any spaces/tabs - if this is desired the lookaheads can either be changed from (?=\s)
to \s+
, you could can do a second replace of \n[ \t]+
with \n
to remove any leading spaces/tabs, or something similar, depending on exactly what you're trying to do.
I believe you can use \r for carriage return and \n for new line
What flavor? Here it's done for C#:
string yourString = @"el tiempo.
campo vectorial vector field. [FIS] Campo ";
string newString = Regex.Replace(yourString, "el tiempo.", "$0\r\n"); // just \n may be sufficient though
EDIT: the above is an answer to the original question. After the excellent answer by Peter Boughton, I don't need to add much. Well, perhaps this, a little regex without look-around assertions, will simply replace all dots followed by one or more newlines with two newlines.
string newString = Regex.Replace(yourString, @"\.(\r|\n)+", ".\r\n\r\n");
精彩评论