Help with sed regex: extract text from specific tag_问答_开发者

Help with sed regex: extract text from specific tag

开发者 https://www.devze.com 2022-12-18 02:35 出处：网络

First time sed\'er, so be gentle. I have the following text file, \'test_file\': <Tag1>not </Ta开发者_如何学JAVAg1><Tag2>working</Tag2>

First time sed'er, so be gentle.

I have the following text file, 'test_file':

 <Tag1>not </Ta开发者_如何学JAVAg1><Tag2>working</Tag2>

I want to extract the text in between <Tag2> using sed regex, there may be other occurrences of <Tag2> and I would like to extract those also.

So far I have this sed based regex:

cat test_file | grep -i "Tag2"| sed 's/<[^>]*[>]//g'

which gives the output:

 not working

Anyone any idea how to get this working?

As another poster said, sed may not be the best tool for this job. You may want to use something built for XML parsing, or even a simple scripting language, such as perl.

The problem with your try, is that you aren't analyzing the string properly.

cat test_file is good - it prints out the contents of the file to stdout.

grep -i "Tag2" is ok - it prints out only lines with "Tag2" in them. This may not be exactly what you want. Bear in mind that it will print the whole line, not just the <Tag2> part, so you will still have to search out that part later.

sed 's/<[^>]*[>]//g' isn't what you want - it simply removes the tags, including <Tag1> and <Tag2>.

You can try something like:

cat tmp.tmp | grep -i tag2 | sed 's/.*<Tag2>\(.*\)<\/Tag2>.*/\1/'

This will produce

working

but it will only work for one tag pair.

For your nice, friendly example, you could use

sed -e 's/^.*<Tag2>//' -e 's!</Tag2>.*!!' test-file

but the XML out there is cruel and uncaring. You're asking for serious trouble using regular expressions to scrape XML.

you can use gawk, eg

$ cat file
 <Tag1>not </Tag1><Tag2>working here</Tag2>
 <Tag1>not </Tag1><Tag2>
working

</Tag2>

$ awk -vRS="</Tag2>" '/<Tag2>/{gsub(/.*<Tag2>/,"");print}' file
working here

working

awk -F"Tag2" '{print $2}' test_1 | sed 's/[^a-zA-Z]//g'

Help with sed regex: extract text from specific tag

精彩评论

关注公众号

热门标签

图文推荐

Help with sed regex: extract text from specific tag

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：