开发者

regex matching an open and close tag and a certain text patterns inside that tag [duplicate]

开发者 https://www.devze.com 2023-03-13 14:02 出处:网络
This question already has answers here: What to do Regular expression pattern doesn't match anywhere in string?
This question already has answers here: What to do Regular expression pattern doesn't match anywhere in string? (8 answers) Closed 7 years ago.

Here is a sample custom tag i have from a sitemap.xml

<url>
  <loc>http://sitename.com/programming/php/?C=D;O=A</loc>
  <changefreq>weekly</changefreq>
  <priority>0.64</priority>
</url>

There are many entries like this and if you see loc tag it has c=d;0=a at the end. I want to remove all entries 开发者_StackOverflow社区starting with <url> ending with </url> which contains C=D;0=A or similar patterns like that.

The following expression matched the whole of the above specified tag

<url>(.|\r\n)*?<\/url>

but I want to match like what i had specified in the above statement.

How do we form regex to match such conditions(patterns) ?


Try this:

/<url>(?:(?!<\/url>).)*C=D;O=A.*?<\/url>/m

The negative lookahead guaranties that you do not match multiple nodes.

See here: rubular


It is not a good idea to use regex for XML. Depending on the language you should use some XML reader, extract the <url> node and then use regex to match the content of the node. One useful language for querying XML data, which is supported by many XML libraries is XPath.


If you absolutely have to use regex, this one:

<([a-z][a-z0-9]*)\b[^>]*>(.*?)(C=D;O=A){1}(.*?)</\1>

will get you the line:

http://sitename.com/programming/php/?C=D;O=A

I would then traverse up to the parent tag and do whatever I wanted with it.

0

精彩评论

暂无评论...
验证码 换一张
取 消