开发者

sed parsing values that don't exist seems to behave inconsistently

开发者 https://www.devze.com 2023-01-22 20:49 出处:网络
I have a file with the following lines in it: bash$ cat blah.txt <smsDeliveryStatus value=\"Provider Malfunction\"/>

I have a file with the following lines in it:

bash$ cat blah.txt
<smsDeliveryStatus value="Provider Malfunction"/>
<smsDeliveryStatus value="Provider Malfunction" id="23434"/>
<smsDeliveryStatus value="Delivery Failure"/>
<smsDeliveryStatus value="Delivery Successful" id="2"/>
bash$

I want to extract value and id from the file for each line and where either value or id do not exist I want to print unknown. I wrote the following c开发者_如何转开发ode which seems to fail some of the time on setting id to unknown and some of the time it fails:

bash$ cat blah.txt | sed -nr "/smsDeliveryStatus /{h; /value/ {s/.*value=\"([^\"]*)?\".*/value: \1/}; /value/! {s/.*/value: Unknown/}; p; x; /id/ {s/.*id=\"([^\"]+)\".*/id: \1/g}; /id/! {s/.*/id: Unknown/g}; p}"

This yields the following result from the above file:

value: Provider Malfunction
<smsDeliveryStatus value="Provider Malfunction"/>
value: Provider Malfunction
id: 23434
value: Delivery Failure
id: Unknown
value: Delivery Successful
id: 2

Bizarrely the first line with id missing is printed out in full and the second line with id missing sets id to unknown as expected. Can anyone shed any light on why this is happening? What is the difference between the first time /id/! is read and the second time?

A


I added multiple lines to the file like so:

bash$ cat blah.txt
<smsDeliveryStatus value="Provider Malfunction"/>
<smsDeliveryStatus value="Provider Malfunction" id="23434"/>
<smsDeliveryStatus value="Delivery Failure"/>
<smsDeliveryStatus value="Delivery Successful" id="2"/>
<smsDeliveryStatus value="Provider Malfunction"/>
<smsDeliveryStatus value="Delivery Failure"/>
<smsDeliveryStatus value="Delivery Successful" id="2"/>
<smsDeliveryStatus value="Provider Malfunction" id="23434"/>
<smsDeliveryStatus value="Delivery Failure"/>
<smsDeliveryStatus value="Provider Malfunction"/>
bash$

When I ran the code again I got the following:

bash$ cat blah.txt |  sed -nr "/smsDeliveryStatus /{h; /value/ {s/.*value=\"([^\"]*)?\".*/value: \1/}; /value/! {s/.*/value: Unknown/}; p; x; /id/ {s/.*id=\"([^\"]*)\".*/id: \1/g}; /id/! {s/.*/id: Unknown/g}; p}"
value: Provider Malfunction
<smsDeliveryStatus value="Provider Malfunction"/>
value: Provider Malfunction
id: 23434
value: Delivery Failure
id: Unknown
value: Delivery Successful
id: 2
value: Provider Malfunction
<smsDeliveryStatus value="Provider Malfunction"/>
value: Delivery Failure
id: Unknown
value: Delivery Successful
id: 2
value: Provider Malfunction
id: 23434
value: Delivery Failure
id: Unknown
value: Provider Malfunction
<smsDeliveryStatus value="Provider Malfunction"/>
bash$ 

Which led me to see that all of the unmatched lines had the letters id in them so I solved it using \b word boundaries around the id like so:

bash$ cat blah.txt |  sed -nr "/smsDeliveryStatus /{h; /value/ {s/.*value=\"([^\"]*)?\".*/value: \1/}; /value/! {s/.*/value: Unknown/}; p; x; /\bid\b/ {s/.*id=\"([^\"]*)\".*/id: \1/g}; /\bid\b/! {s/.*/id: Unknown/g}; p}"
value: Provider Malfunction
id: Unknown
value: Provider Malfunction
id: 23434
value: Delivery Failure
id: Unknown
value: Delivery Successful
id: 2
value: Provider Malfunction
id: Unknown
value: Delivery Failure
id: Unknown
value: Delivery Successful
id: 2
value: Provider Malfunction
id: 23434
value: Delivery Failure
id: Unknown
value: Provider Malfunction
id: Unknown
bash$ cat blah.txt

So in the end I solved it myself. I hope this helps someone else though.

A

0

精彩评论

暂无评论...
验证码 换一张
取 消