I'm having little problem with bash/sed. I need to be able to use command substitution within sed expression. I have two big text files:
first is logfile.txt which sometimes* shows error messages by ID (0xdeadbeef is common example) in format ERRORID:0xdeadbeef
second errors.txt has error messages stored in pairs LONG_ERROR_DESCRIPTION, 0xdeadbeef
I was trying to use sed with bash command substitution to do the task:
cat logfile.txt | sed "s/ERRORID:\(0x[0-9a-f]*开发者_运维百科\)/ERROR:$(cat errors.txt |
grep \1 | grep -o '^[A-Z_]*' )/g"
(^^^ this should be in one line of course)
If it would work then I could get little nicer version of logfile with better error info.
Lot's of meaningless stuff ERRORID:0xdeadbeef and something else =>
=> Lot's of meaningless stuff ERROR:LONG_ERROR_DESCRIPTION and something else
But it doesn't. The problem is that sed is unable to "inject" regex section (\1) into command substitution. What are my other options? I know that it's possible to build sed expression first or do it other way but I would like to avoid parsing those files several times (they can be huge).
As always big thanks for any help.
*there is no real formatting inside logfile. No sections, columns, tab/coma-separation are used inconsistently
PS. Just to explain. Following expression works, but of course there is no argument passing within it:
echo "my cute cat" | sed "s/cat/$(echo dog)/g"
You can create a sed script from the error message catalog, then apply that sed script to the log file.
Basically, something along these lines:
sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt |
sed -f - logfile.txt
The output from the first sed script should be something like this:
s%ERRORID:0x00000001%ERROR:Out of memory%
s%ERRORID:0x00000002%ERROR:Stack overflow%
s%ERRORID:0x00000031%ERROR:values of beta may cause dom%
That is, a new sed script which specifies a substitution for each error code in the catalog.
There are different dialects of sed so this may require minor tweaking. The sed on Linux I believe should use backslash before grouping parentheses in regular expressions, and gladly tolerate standard input as the argument to the -f
option. This is not portable to other Unices, though (but you could substitute Perl for sed if you need portability).
*Edit: If the error messages are fairly static, and/or you want to read the log from standard input, save the generated script in a file;
# Do this once
sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt >errors.sed
# Use it many times
sed -f errors.sed logfile.txt
You could also add #!/usr/bin/sed -f
at the top of errors.sed
and chmod +x
it to make it into a self-contained command script.
I don't know if this would work, since I can't get an answer on whether or not capture groups persist, but there is a lot more to sed than just the s
command. I was thinking you could use a capture group in a regex line selector, then use that for the command substitution. Something like this:
/ERRORID:\(0x[0-9a-f]*\)/ s/ERRORID:0x[0-9a-f]*/ERROR:$(grep \1 errors.txt | grep -o '^[A-Z_]*' )/
Anyway, if that doesn't work I would change gears and point out that this is really a good job for Perl. Here's how I would do it, which I think is much cleaner / easier to understand:
#!/usr/bin/perl
while(<>) {
while( /ERRORID:(0x[0-9a-f]*)/ ) {
$name = system("grep $1 errors.txt | grep -o '^[A-Z_]*'");
s/ERRORID:$1/ERROR:$name/g;
}
print;
}
Then execute:
./thatScript.pl logfile.txt
Just to let people looking for solution with bare shell and sed. Not perfect but working:
cat logfile.txt | while read line ; do id=$(echo -E "$line" |
grep "ERRORID:0x[0-9a-f]*" | grep -o "0x[0-9a-f]*" ) ;
if [ ! -z "$id" ] ; then echo -E "$line" | sed "s/$id/$(grep $id errors.txt |
grep -o '^[A-Z_]*' )/g" ;else echo -E "$line" ; fi ; done
If you see some fixing options then please share.
With GNU awk for gensub() and the 3rg arg to match():
$ awk '
NR==FNR {
map[$NF] = gensub(/,[^,]+$/,"",1)
next
}
match($0,/(.*ERRORID:)(0x[[:xdigit:]]+)(.*)/,a) {
$0 = a[1] (a[2] in map ? map[a[2]] : a[2]) a[3]
}
1' errors.txt logfile.txt
Lot's of meaningless stuff ERRORID:LONG_ERROR_DESCRIPTION and something else =>
The above will run much faster than the sed scripts in the currently accepted answer and won't fail given various possible contents of LONG_ERROR_DESCRIPTION
such as %
or &
or \1
, nor will it fail when a given ERRORID is a subset of another, e.g. if 0xdead
and 0xdeadbeef
are 2 separate error codes then the sed scripts can fail depending on the order they appear in errors.txt, e.g. they could convert ERRORS:0xdeadbeef
to ERRORS:LONG_ERROR_DESCRIPTIONbeef
. by mapping 0xdead
first.
精彩评论