开发者

cross-encoding XSL transformations

开发者 https://www.devze.com 2023-03-15 13:38 出处:网络
I have some operations to do on an XML files (nothing i开发者_如何学Cmportant) and XSL applies very well in this case.

I have some operations to do on an XML files (nothing i开发者_如何学Cmportant) and XSL applies very well in this case. However, my input file is encoded in UTF-8 and the file after the transformation MUST be encoded in iso-8859-1. (I do not control the encoding of the input file either)

Everything goes well except that some special characters present in utf-8 and not in iso-8859 are escaped in the output file.

For instance I have <text>some text with a € character</text> transformed in <text>some text with a &#8364; character</text>

The "€" is in the output file is an issue for me.

As we have to do something with those special characters which are not in ISO, I first thought of transforming them manually with the replace function: replace(., '€', 'euros') But there are just so many characters in utf-8 which are not in iso that it's quickly boring... and slow!

Do you have a better solution ? (assuming we could just remove those characters or transforming them to any viable iso character)

Thanks in advance


Do you have

<xsl:output encoding="iso-8859-1" />

in place?

Because that should be all you need, really. If your XSL processor does not correctly translate characters to the target encoding on its own, it is broken and you need to use a different one.

Hints

  • Often Windows-1252 is what people really mean when they say ISO-8859-1. Check closely if that applies to you as well. There are subtle differences between the two (especially with regard to the Euro sign, which does not exist in ISO-8859-1, but does exist in Windows-1252 and ISO-8859-15).
  • Whenever an XML declaration <?xml version="1.0" encoding="iso-8859-1"?> is missing in an XML file, UTF-8 encoding is assumed. Be sure to put a declaration on top of your file whenever is not UTF-8 encoded.
0

精彩评论

暂无评论...
验证码 换一张
取 消