开发者

Use sed to replace first 8 and last 4 pipes on every line in a file

开发者 https://www.devze.com 2022-12-20 13:40 出处:网络
Here\'s the situation, I have a text file that is pipe-delimited and one of fields contains pipe characters. I already have a sed script that will change it to be tab-delimited, but the problem is it\

Here's the situation, I have a text file that is pipe-delimited and one of fields contains pipe characters. I already have a sed script that will change it to be tab-delimited, but the problem is it's terribly slow. It will replace the first occurrence of a pipe 8 times, then replace the last occurrence of a pipe 4 times. I'm hoping there's a quicker way to do what I need.

Any thoughts would be appreciated. Here's my current sed script:

sed 's/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s开发者_如何转开发/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/|\(.*\)/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/;s/\(.*\)|/\t/' $1 > $1.tab

Thanks,

-Dan


 sed 's/\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|/\1\t\2\t\3\t\4\t\5\t\6\t\7\t\8\t/;s/|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)|\([^|]\+\)$/\t\1\t\2\t\3\t\4/'

HTH


This is somewhat scalable, but it's still an eye-glazer. You can change the "8" and the "4" to select which ranges of pipes you want to replace or change the pipes or tabs to some other characters.

As a one-liner:

sed 's/|/\n/8; h; s/.*\n//; x; s/\n.*/\t/; s/|/\t/g; G; s/\n//; s/\(\(|[^|]*\)\{4\}\)$/\n\1/; h; s/.*\n//; s/|/\t/g; x; s/\n.*//; G; s/\n//'

Here it is broken out. I've over-commented it so it's easy to follow.

sed '
s/|/\n/8     # split
h            # dup
s/.*\n//
# this is now the field which will retain the pipes 
# plus the fields at the end of the record
x            # swap
s/\n.*/\t/   # replace
s/|/\t/g
# this is now all the tab-delimited fields at the beginning of the record
G            # append
s/\n//
# this is now the full record with the first part completed
# the rest of the steps are similar to the steps above
s/\(\(|[^|]*\)\{4\}\)$/\n\1/    # split
h            # dup
s/.*\n//
s/|/\t/g     #replace
# this is now the last four fields that have been tab delimited
x            # swap
s/\n.*//
# this is the first eight fields plus the field with the retained pipes
G            # append
s/\n//
# now print the full record with everything done
'


I worked with Dan when he needed this, but realized (like ghostdog74) that AWK was a better tool, but here's my possibly inefficient answer.

awk -F"|" 'BEGIN{OFS="\t"}{for (i=10; i < NF-3; i++) $9=$9 "|" $i; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$(NF-3),$(NF-2),$(NF-1),$(NF)}' $file > $file.tab

What do you folks think?


Dennis is right you should use the quantifier to specify how many occurences of the pattern you want the action to be performed on.

Have a look at the below link under the "Basic substitutions" as it's more readable on the website than it's here: http://www.readylines.com/sed-one-liners-examples

Hope that helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消