I have a regex and replacement pattern that have both been tested in Notepad++ on my input data and work correctly. When I put them into a sed expression, however, nothing gets matched.
Here is the sed command:
# SEARCH = ([a-zA-Z0-9.]+) [0-9] (.*)
# REPLACE = \2 (\1)
sed -e 's/\([a-zA-Z0-9.]+\) [0-9] \(.*\)/\2 \(\1\)/g'
Here is a sampling of the data:
jdoe 1 Doe, John
jad 1 Doe, Jane
smith 2 Smith, Jon
and the desired output:
Doe, John (jdoe)
Doe, Jane (jad)
Smith, Jon (smith)
I have tried removin开发者_如何学Gog and adding escapes to different characters in the sed expression, but either get nothing matched or something along the lines of:
sed: -e expression #1, char 42: invalid reference \2 on `s' command's RHS
How can I get this escaped correctly?
I usually find it easier to use the -r switch as this means that escaping is similar to that of most other languages:
sed -r 's/([a-zA-Z0-9.]+) [0-9] (.*)/\2 (\1)/g' file1.txt
A few warnings and additions to what everyone else has already said:
- The
-r
option is a GNU extension to enable extended regular expressions. BSD derived sed's use-E
instead. - Sed and Grep use Basic Regular Expressions
- Awk uses Extended Regular Expressions
- You should become comfortable with the POSIX specifications such as IEEE Std 1003.1 if you want to write portable scripts, makefiles, etc.
I would recommend rewriting the expression as
's/\([a-zA-Z0-9.]\{1,\}\) [0-9] \(.*\)/\2 (\1)/g'
which should do exactly what you want in any POSIX compliant sed
. If you do indeed care about such things, consider defining the POSIXLY_CORRECT
environment variable.
The plus sign needs to be escaped when not using the -r
switch.
Using awk is much simpler...:
cat test.txt | awk '{ print $3 " " $4 " " "("$1")" }'
Output:
Doe, John (jdoe)
Doe, Jane (jad)
Smith, Jon (smith)
See man awk 1
$ sed -e 's/\([a-zA-Z0-9.].*\) [0-9] \(.*\)/\2 \(\1\)/g' file
Doe, John (jdoe)
Doe, Jane (jad)
Smith, Jon (smith)
精彩评论