开发者

Do not merge the context of contiguous matches with grep

开发者 https://www.devze.com 2023-03-08 19:47 出处:网络
If I run grep -C 1 match over the following file: a b match1 c d e match2 f match3 g I get the following o开发者_JS百科utput:

If I run grep -C 1 match over the following file:

a
b
match1
c
d
e
match2
f
match3
g

I get the following o开发者_JS百科utput:

b
match1
c
--
e
match2
f
match3
g

As you can see, since the context around the contiguous matches "match2" and "match3" overlap, they are merged. However, I would prefer to get one context description for each match, possibly duplicating lines from the input in the context reporting. In this case, what I would like is:

b
match1
c
--
e
match2
f
--
f
match3
g

What would be the best way to achieve this? I would prefer solutions which are general enough to be trivially adaptable to other grep options (different values for -A, -B, -C, or entirely different flags). Ideally, I was hoping that there was a clever way to do that just with grep....


I don't think it is possible to do that using plain grep.

the sed construct below works to some extent, now I only need to figure out how to add the "--" separator

$ sed -n -e '/match/{x;1!p;g;$!N;p;D;}' -e h log
b
match1
c
e
match2
f
f
match3
g


I don't think this is possible using plain grep.

Have you ever used Python? In my opinion it's a perfect language for such tasks (this code snippet will work for both Python 2.7 and 3.x):

with open("your_file_name") as f:
   lines = [line.rstrip() for line in f.readlines()]
   for num, line in enumerate(lines):
      if "match" in line:
         if num > 0:
            print(lines[num - 1])

         print(line)

         if num < len(lines) - 1:
            print(lines[num + 1])
            if num < len(lines) - 2:
               print("--")

This gives me:

b
match1
c
--
e
match2
f
--
f
match3
g


I'd suggest to patch grep instead of working around it. In GNU grep 2.9 in src/main.cpp:

933       /* We print the SEP_STR_GROUP separator only if our output is
934          discontiguous from the last output in the file. */
935       if ((out_before || out_after) && used && p != lastout && group_separator)
936         {
937           PR_SGR_START_IF(sep_color);
938           fputs (group_separator, stdout);
939           PR_SGR_END_IF(sep_color);
940           fputc('\n', stdout);
941         }
942 

A simple additional flag would suffice here.

Edit: Well, d'oh, it is of course not THAT simple since grep would not reproduce the context, just add a few more separators. Due to the linearity of grep, the whole patch is probably not that easy. Nevertheless, if you have a good case for the patch, it could be worth it.


This does not appear possible with grep or GNU grep. However it is possible with standard POSIX tools and a good shell like bash as leverage to obtain the desired output.
Note: neither python nor perl should be necessary for the solution. Worst case, use awk or sed.

One solution I rapidly prototyped is something like this (it does involve overhead of re-reading the file, and this solution depends on whether this overhead is OK, and the give-away is the original question's use of -1 as fixed number of lines of context which allows simple use of head & tail) :

$ OIFS="$IFS"; lines=`grep -n match greptext.txt | /bin/cut -f1 -d:`; 
for l in $lines; 
do IFS=""; match=`/bin/tail -n +$(($l-1)) greptext.txt | /bin/head -3`; 
echo $match; echo "---"; 
done; IFS="$OIFS"

This might have some corner case associated with it, and this resets IFS when perhaps not necessary, though it is a hint for trying to use the power of POSIX shell & tools rather than a high level interpreter to get the desired output.

Opinion: All good operating systems have: grep, awk, sed, tr, cut, head, tail, more, less, vi as built-ins. On the best operating systems, these are in /bin.

0

精彩评论

暂无评论...
验证码 换一张
取 消