开发者

Sed / awk script to correct illegal characters from XML (ampersand)

开发者 https://www.devze.com 2023-01-14 18:35 出处:网络
For parsing an invalid XML file, having either unencoded, illegal characters (ampersands in my case):

For parsing an invalid XML file, having either unencoded, illegal characters (ampersands in my case):

<url>http://example.co开发者_如何学编程m?param1=bad&param2=ampersand</url>

and encoded ones

<description> The good, the bad &amp; the ugly </description>

Please post an example with a sed/awk script that can encode the illegal characters.


tidy -m -xml <your-xml-file>


Completely untested, but you could cheat by converting all the valid ones back to their original form then doing the conversion back again.

For example, if you only had to worry about ampersands, you could do something similar to:

sed 's/&amp;/&/g' | sed 's/&/&amp;/g'

Of course, you can do it a lot cleaner and their will be better solutions, but some rest is calling me and I'm sure you can work it out from here.

0

精彩评论

暂无评论...
验证码 换一张
取 消