I'm using IXMLDOMDocument::transformNode
from MSXML 3.0 to apply XSLT transforms. Each of the transforms has an xsl:output
directive that specifies UTF-8
as the encoding. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
...
xml开发者_JAVA百科ns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
xmlns:math="http://exslt.org/math"
extension-element-prefixes="str math">
<xsl:output encoding="UTF-8" indent="yes" method="xml" />
...
</xsl:stylesheet>
Yet the transformed result is always UTF-16
(and the encoding attribute says UTF-16
).
<?xml version="1.0" encoding="UTF-16"?>
Is this a bug in MSXML?
For various reasons, I'd really like to have UTF-8
. Is there a workaround? Or do I have to convert the transformed result to UTF-8
myself and patch up the encoding attribute?
Update: I've worked around the problem by accepting the UTF-16
encoding and prepending a byte-order mark, which satisfies the downstream users of the transformed result, but I'm still be interested in how to get UTF-8
output.
You're probably sending the ouput to a DOM tree or to a character stream, not to a byte stream. If that's the case then it's not MSXML that's doing the encoding, and whatever does do the final encoding has no knowledge of the xsl:output directive (or indeed, of XSLT).
Supplementing what Michael Kay said (which is spot on, of course), here's a JScript example how to transform to a stream, using the XSLT serialization in the process:
// command line args
var args = WScript.Arguments;
if (args.length != 3) {
WScript.Echo("usage: cscript msxsl.js in.xml ss.xsl out.xml");
WScript.Quit();
}
xmlFile = args(0);
xslFile = args(1);
resFile = args(2);
// DOM objects
var xsl = new ActiveXObject("MSXML2.DOMDOCUMENT.6.0");
var xml = xsl.cloneNode(false);
// source document
xml.validateOnParse = false;
xml.async = false;
xml.load(xmlFile);
if (xml.parseError.errorCode != 0)
WScript.Echo ("XML Parse Error : " + xml.parseError.reason);
// stylesheet document
xsl.validateOnParse = false;
xsl.async = false;
xsl.resolveExternals = true;
//xsl.setProperty("AllowDocumentFunction", true);
//xsl.setProperty("ProhibitDTD", false);
//xsl.setProperty("AllowXsltScript", true);
xsl.load(xslFile);
if (xsl.parseError.errorCode != 0)
WScript.Echo ("XSL Parse Error : " + xsl.parseError.reason);
// output object, a stream
var stream = WScript.createObject("ADODB.Stream");
stream.open();
stream.type = 1;
xml.transformNodeToObject( xsl, stream );
stream.saveToFile( resFile );
stream.close();
You may test using this input:
<Urmel>
<eins>Käse</eins>
<deux>café</deux>
<tre>supplì</tre>
</Urmel>
And this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I think it'll be easy for you to adapt the JScript example to C++.
As you noted, BSTRs are all UTF-16. However, I think Michael Ludwig might be on to something here. Have you tried using this method?
HRESULT IXMLDOMDocument::transformNodeToObject(
IXMLDOMNode *stylesheet,
VARIANT outputObject);
You should be able to just use CreateStreamOnHGlobal, stash the resultant IStream ptr into a VARIANT, and pass that in as the outputObject parameter. Theoretically. I haven't actually tried this, though :)
精彩评论