开发者

"diff" tool's flavor of regex seems lacking?

开发者 https://www.devze.com 2022-12-17 14:43 出处:网络
I have two files I\'ve been trying to compare with diff. The files are automatically generated and feature a number of lines that look like:

I have two files I've been trying to compare with diff. The files are automatically generated and feature a number of lines that look like:

//!   Generated Date  : Mon, 14, Dec 2009

I'd like those differences to be ignored, and have set out to use the "-I REGEX" flag to make that happen.

However, the number of spaces that appear between "Date" and the colon varies and unfortunately, it seems the flavor of regular expressions employed by diff lacks a number of the basic regex utilities.

For instance, I cannot for the life of me get the "one or more" plus-sign to work. Same deal with the "\s" representation of whitespace.

diff -I '.*Generated Date\s+:.*' ....

and

diff -I '.*Generated Date +:.*' ....

both fail spectacularly.

Rather than continuing to blindly try things, can somebody out there point me to a good reference on the diff-specific subset of regular expressions?

Thanks!

===== EDIT =======

Thanks to FalseVinylShrub, I've established that I should be escaping my '+' and any similar characters. This fixes the problem somewhat. Diff successfully matches

.*Generated Date \+.*

and

.*Generated Date  *.*

(Note that there are two spaces between "Date" and "*".)

However, the second I try to add the ':' to that expression, like so:

.*Generated Date \+:.*

and

.*Generated Date \+\:.*

Both versions fail to match the string in question and cause diff to take a significantly g开发者_StackOverflow中文版reater amount of time to run. Any thoughts there?


Very interesting... I couldn't find a documentation reference, but a little experimentation found that:

  • ␠* and .* worked if zero-or-more is OK for you
  • As you said, ␠+ doesn't work. Neither did ␠{1,}... but ␠\{1,\} did work
  • UPDATE: ␠\+ also works!

( is representing a space character, that didn't show up).

I'm using GNU diff from GNU diffutils 2.8.1.

man diff and info diff didn't explain the RE syntax.

Hope this helps.

UPDATE: I found a brief section in man grep:

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

So I guess it's using Basic regex syntax.


Ok, here's what the GNU diff source says.

re_set_syntax (RE_SYNTAX_GREP | RE_NO_POSIX_BACKTRACKING);

I think that means, "same as gnu grep -G" (Basic Regular Expression). According to the gnu grep man page:

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Forget about \s, \S, etc.


According to the specification, diff doesn't support regular expressions, nor does it have an -I switch.

You appear to be using a non-standard diff with non-standard extensions. How those non-standard extensions work, should be described in the documentation of whatever non-standard diff you are using.

0

精彩评论

暂无评论...
验证码 换一张
取 消