开发者

Remove Namespace references from XML with Regex

开发者 https://www.devze.com 2023-02-13 09:45 出处:网络
I have a regex that removes xmlns references from XML. It works fine when there are matching tags, but if the the xmlns reference is in a single tag it removes \"/\" as well.

I have a regex that removes xmlns references from XML. It works fine when there are matching tags, but if the the xmlns reference is in a single tag it removes "/" as well.

Here is the regex:

"<(.*?) xmlns[:=].*?>", "<$1>"

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>

I get what I want:

<ns22:someTagName></ns22:someTagName>

When I use the regex on this line of xml:

<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>

I get this invalid X开发者_JAVA百科ML:

<ns22:someTagName>

It removes the reference fine, but it takes the closing "/" with it.

Thanks for the help, Scott


Rather than trying to preserve what you need from the XML it would be better to target what you want to remove.

This expression targets just the namespace itself:

\sxmlns[^"]+"[^"]+"

Unfortunately I don't know LotusScript so I can't give you a code sample of how to use this but what you need to do is something like this psuedocode:

result = regex.replace(yourString, '\sxmlns[^"]+"[^"]+"', '')

What you will do here is replace all matches with an empty string (effectively removing them). This will work for both a closed and self-closed XML tag and it will also work if the tag doens't have a namespace at all.

Edit: Here is a fully-functional Python example:

>>> from re import sub
>>> pattern = r'\sxmlns[^"]+"[^"]+"'
>>> closed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"></ns22:someTagName>'
>>> sub(pattern, '', closed)
'<ns22:someTagName></ns22:someTagName>'
>>> selfclosed = r'<ns22:someTagName xmlns:ns22="http://exampledatatypes.com"/>'
>>> sub(pattern, '', selfclosed)
'<ns22:someTagName/>'


Don't use regex on XML if you have access to an XML parser! That being said, I don't know anything about LotusScript's XML parsing capabilities (if it even has them), so if you must use regex, this will get you closer:

<([^>]*?)\bxmlns\b[^"']+('|").*?$2(.*?/?>)

to be replaced with:

<$1$3

The most important change here from your original regex is the /? toward the end. BTW, I haven't escaped the qoutes or backslashes since I don't know LotusScript syntax for that, and I assume you do.

There will always be XML-valid input that cannot be properly understood by this, due to the limitations of regex. However, it should work for most cases. You could double-check manually by searching for the string "xmlns" afterward.


regex \s*xmlns(:\w+)?="[^"]*" can remove both implicit / named xmlns.

In Java, xmlString.replaceFirst("\\s*xmlns(:\\w+)?=\"[^\"]*\"", "")

https://regexr.com/ is a great tool to use for writing/testing these.

0

精彩评论

暂无评论...
验证码 换一张
取 消