开发者

how to print a section of file between two regular expressions only if a line within the section contains a certain string within it

开发者 https://www.devze.com 2023-01-13 03:25 出处:网络
I have a file of events that has multiple multi lined events between <event> and </event> tags. I want to print out the entire event From <event> to </event> only if a line wit

I have a file of events that has multiple multi lined events between <event> and </event> tags. I want to print out the entire event From <event> to </event> only if a line within that event contains either the string uniqueId="1279939300.862594_PFM_1_1912320699" or uniqueId="1281686522.353435_PFM_1_988171542". The file has 100000 events in it and each event has between 20 and 35 lines (attributes within the event vary its length). I started off using sed but need a little help beyond:

cat xmlEventLog_2010-03-23T* | sed -nr "/<event eventTimestamp/,/<\/event>/"

What do I need to do to finish this? Also is sed the best way of doing this given the size of the files?

Thanks in advance

A

I wanted to edit this to update. For certain reasons I want to do this with sed. I tried Denis's solution but it does not seem to work:

bash$ grep 1279939300.862594_PFM_1_1912320699 xmlEventLog*
xmlEventLog_2010-03-23T02:41:15_PFM_1_1.xml:    <event eventTimestamp="2010-03-23T02:41:40.861" originalReceivedMessageSize="0" uniqueId="1279939300.862594_PFM_1_1912320699">
bash$ grep 1281686522.353435_PFM_1_988171542 xmlEventLog*
xmlEventLog_2010-03-23T07:47:38_PFM_1_1.xml:    <event eventTimestamp="2010开发者_StackOverflow社区-03-23T08:02:02.299" originalReceivedMessageSize="685" uniqueId="1281686522.353435_PFM_1_988171542">
bash$ time sed -n ':a; /<event>/,/<\/event>/ N; /<event>/,/<\/event>/!b; /<\/event>/ {/uniqueId="1279939300.862594_PFM_1_1912320699"\|uniqueId="1281686522.353435_PFM_1_988171542"/p;d}; ba' xmlEventLog*

real    1m13.134s
user    1m12.463s
sys     0m0.659s
bash$

Which obviously returned nothing. So is it possible to do this with sed?

A


awk -vRS="</event>" '/<event>/ && /1279939300.862594_PFM_1_1912320699|1281686522.353435_PFM_1_988171542/{print}' file


Give this a try:

sed -n ':a; /<event>/,/<\/event>/ N; /<event>/,/<\/event>/!b; /<\/event>/ {/uniqueId="1279939300.862594_PFM_1_1912320699"\|uniqueId="1281686522.353435_PFM_1_988171542"/p;d}; ba'


You should be able to embed the unique ids directly into the regular expression, using the | character to allow either uniqueid. I did a quick test and the following regular expression seems to find the correct entries:

 <event.*?uniqueid=("1279939300\.862594_PFM_1_1912320699"|"1281686522\.353435_PFM_1_988171542").*?</event>
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号