Replacing multiple blank lines with one blank line using RegEx search and replace_问答_开发者

I have a file that I need to reformat and remove "extra" blank lines.

I am using the Perl syntax regular expression search and replace functionality of UltraEdit and need the regular expression to put in the "Find What:" field.

Here is a sample of the file I need to re-format.

All current text

REPLACE with all the following:


Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Winter 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 


African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30


African Dance / Children

You'll notice that some of the double blank lines have spaces or tabs or both in them.

After the search and replace has been run I should have a file that looks like this.

All current text

REPLACE with all the following:

Winter 2011 Class Schedule 

Winter 2011 Class Registration Dates:  Dec. 6, 2010 – Jan. 1, 2011
Wint开发者_C百科er 2011 Class Session Dates:  Jan. 5 – Feb. 12, 2011

DANCE

Adventures in Ballet & Tap      
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old develop a greater sense of rhythm, flexibility and coordination as they explore the basic elements of movement.
Saturdays   9 - 10 a.m.     Jan. 8 – Feb. 12        Six-week fees:   $30 

African Storytelling
3 – 6 years Instructor:  Ann Newby
Tots ages 3 – 6 years old explore storytelling and fables through spoken word, music, movement and visual arts experiences.
Saturdays   10 – 11 a.m.    Jan. 8 – Feb. 12        Six-week fee:   $30

African Dance / Children

Replacing

^(\s*\r\n){2,}

With

\r\n

Is what I ended up with.

This only selects blank lines in multiples of two or more and replaces them with one.

It depends what the line endings are. Assuming \n, replace this:

([ \t]*\n){3,}

with \n\n.

Try this perl oneliner perl -00pe0, if you want in place editing, just add -i option

Replacing

\n\s*\n\s*

with

\n\n

should do the trick

For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

And some words on what Alan Moore wrote in his answer:

UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

A Perl regular expression replace working for files with

carriage return + line feed (DOS/Windows) or
only line feed (Unix, Mac OS 10.0 and later versions) or
only carriage return (Mac OS 9 and previous versions)

as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

Explanation:

\h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

(\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

(?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

{2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

The replace string \1\1 references twice the first found line break.

The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

{2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

In Vim, Using

:%!cat -s

I find this is the easiest way to delete extra empty line so far.

I'm not sure what UltraEdit lets you get away with in the "replace" area, but if you cannot use a newline (I've had this problem before) but can use capture references, this might work:

Find    : \s*(\r\n)\s*(\r\n)\s*\r\n
Replace : $1$2

Not tested extensively, but seems to work on the sample you provided.

See this thread for what's causing the problem. As I understand it, UltraEdit regexes are greedy at the character level (i.e., within a line), but non-greedy at the line level (roughly speaking). I don't have access to UE, but I would try writing the regex so it has to match something concrete after the last blank line. For example:

search:   (\r\n[ \t]*){2,}(\S)
replace:  $1$2

This matches and captures two or more instances of a line separator and any horizontal whitespace that follows it, but it only retains the last one. The \S should force it to keep matching until it finds a line with at least one non-whitespace character.

I admit that I don't have a whole lot of confidence in this solution; UltraEdit's regex support is crippled by its line-based architecture. If you want an editor that does regexes right, and you don't want to learn a whole new regex syntax (like vim's), get EditPadPro.

Should also work with spaces on blank lines