开发者

How do I set the encoding statement in the XML declaration when performing an XSL transformation using a COM Msxml2.XSLTemplate?

开发者 https://www.devze.com 2022-12-10 23:11 出处:网络
I wrote a simple package installer in WinBatch that needs to update an XML file with information about the package contents. My first stab at it involved loading the file with Msxml2.DOMDocument, addi

I wrote a simple package installer in WinBatch that needs to update an XML file with information about the package contents. My first stab at it involved loading the file with Msxml2.DOMDocument, adding nodes and data as required, then saving the data back to disk. This worked well enough, except that it would not create tab and CR/LF whitespace in the new data. The solution I came up with was writing an XSL stylesheet that would recreate the XML file with whitespace added back in. I'm doing this by:

  1. loading the XSL file into an Msxml2.FreeThreadedDOMDocument object
  2. setting that object as the stylesheet property of an Msxml2.XSLTemplate object
  3. creating an XSL processor via Msxml2.XSLTemplate.createProcessor()
  4. setting my original Msxml2.DOMDocument as the input property of the XSL processor
  5. Calling transform() method of the XSL processor, and saving the output to a file.

This works as for as reformatting the XML file with tabs and carriage returns, but my XML declaration comes out either as <?xml version="1.0"?> or <?xml version="1.0" encoding="UTF-16"?> depending on whether I used Msxml2.*.6.0 or Msxml2.* objects (a fall back if the system doesn't have 6.0).

If the encoding is set to UTF-16, Msxml12.DOMDocument complains about trying to convert UTF-16 to 1-byte encoding the next time I run my package installer. I've tried creating and adding an XML declaration using both createProcessingInstruction() to both the XML and XSL DOM objects, but neither one seems to affect the output of the XSLTemplate processor. I've also set encoding to UTF-8 in the <xsl:output/> tag in my XSL file.

Here is the relevant code in my Winbatch script:

    xmlDoc = ObjectCreate("Msxml2.DOMDocument.6.0")
    if !xmlDoc then xmlDoc = ObjectCreate("Msxml2.DOMDocument")

    xmlDoc.async = @FALSE
    xmlDoc.validateOnParse = @TRUE
    xmlDoc.resolveExternals = @TRUE
    xmlDoc.preserveWhiteSpace = @TRUE
    xmlDoc.setProperty("SelectionLanguge", "XPath")
    xmlDoc.setProperty("SelectionNamespaces", "xmlns:fns='http://www.abc.com/f_namespace'")
    xmlDoc.load(xml_file_path)

    xslStyleSheet = ObjectCreate("Msxml2.FreeThreadedDOMDocument.6.0")
    if !xslStyleSheet then xslStyleSheet = ObjectCreate("Msxml2.FreeThreadedDOMDocument")

    xslStyleSheet.async = @FALSE
    xslStyleSheet.validateOnParse = @TRUE
    xslStyleSheet.load(xsl_style_sheet_path)

    xslTemplate = ObjectCreate("Msxml2.XSLTemplate.6.0")
    if !xslTemplate then xslTemplate = ObjectCreate("Msxml2.XSLTemplate")

    xslTemplate.stylesheet = xslStyleSheet

    processor = xslTemplate.createProcessor()
    processor.input = xmlDoc
    processor.transform()

    ; create a new file and write the XML processor output to it
    fh = FileOpen(output_file_path, "WRITE" , @FALSE)
    FileWrite(fh, processor.output)
    FileClose(fh)

The style sheet, with some slight changes to protect the innocent:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1">
    <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
    <xsl:template match="/">
        <fns:test_station xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fns="http://www.abc.com/f_namespace">
            <xsl:for-each select="/fns:test_station/identification">
                <xsl:text>&#x0A;    </xsl:text>
                <identification>开发者_JAVA百科
                    <xsl:for-each select="./*">
                        <xsl:text>&#x0A;        </xsl:text>
                        <xsl:copy-of select="."/>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </identification>
            </xsl:for-each>
            <xsl:for-each select="/fns:test_station/software">
                <xsl:text>&#x0A;    </xsl:text>
                <software>
                    <xsl:for-each select="./package">
                        <xsl:text>&#x0A;        </xsl:text>
                        <package>
                            <xsl:for-each select="./*">
                                <xsl:text>&#x0A;            </xsl:text>
                                <xsl:copy-of select="."/>
                            </xsl:for-each>
                            <xsl:text>&#x0A;        </xsl:text>
                        </package>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </software>
            </xsl:for-each>
            <xsl:for-each select="/fns:test_station/calibration">
                <xsl:text>&#x0A;    </xsl:text>
                <calibration>
                    <xsl:for-each select="./item">
                        <xsl:text>&#x0A;        </xsl:text>
                        <item>
                            <xsl:for-each select="./*">
                                <xsl:text>&#x0A;            </xsl:text>
                                <xsl:copy-of select="."/>
                            </xsl:for-each>
                        <xsl:text>&#x0A;        </xsl:text>
                        </item>
                    </xsl:for-each>
                    <xsl:text>&#x0A;    </xsl:text>
                </calibration>
            </xsl:for-each>
        </fns:test_station>
    </xsl:template>
</xsl:stylesheet>

And this is a sample output file:

<?xml version="1.0" encoding="UTF-16"?>
<fns:test_station xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:fns="http://www.abc.com/f_namespace">
    <software>
        <package>
            <part_number>123456789</part_number>
            <version>00</version>
            <test_category>1</test_category>
            <description>name of software package</description>
            <execution_path>c:\program files\test\test.exe</execution_path>
            <execution_arguments>arguments</execution_arguments>
            <crc_path>c:\ste_config\crc\123456789.lst</crc_path>
            <uninstall_path>c:\ste_config\uninstall\uninst_123456789.bat</uninstall_path>
            <install_timestamp>2009-11-09T14:00:44</install_timestamp>
        </package>
    </software>
</fns:test_station>


The problem is that the output of the transform() method of the XSLT processor is being serialised as a string when you access the output property (either directly or indirectly), and Windows uses UTF-16 encoding for strings. The MSDN documentation of the output property mentions this almost in passing at the foot of the page:

In this case, the output is always generated in the Unicode encoding, and the encoding attribute on the element is ignored.

(where they mean UTF-16 when they say "the Unicode encoding".)

If you use transformNodeToObject, specifying a new DOMDocument object as the output, then you can save the serialisation of the UTF-8 encoded content from that.

Better still for your case, if you have an object implementing the IStream interface such as the stream associated with the file you're trying to save, you can pass that to transformNodeToObject to send the UTF-8 output directly to disk. (I can't remember if you have to open and close the file manually in this case, so you'll have to experiment with that.)


You could try using ADODB.Stream to save it in the UTF-8 encoding.

While I don't have Winbatch, extrapolating from VBScript something like the following would work:

Set oStream = ObjectCreate("ADODB.Stream")
oStream.Open
oStream.Charset = "UTF-8"

processor.Output = oStream
processor.Transform

oStream.SaveToFile(output_file_path)
oStream.Close


You can do this with JavaScript (Windows Script Host will run it):

function xmlTransformAndSave(xml, xsl, saveXmlPath, saveEnableOverwrite) {
 // Transforms input XML and saves output to file, preserving encoding specified
 // by xsl:output encoding attribute. The method used resolves the issue of XSL
 // XML output forced to UTF-16 encoding the moment it becomes a string in
 // JavaScript (JavaScript strings are UTF-16). Note saveEnableOverwrite is an
 // optional parameter enabled by default.

 // Optional input parameter default value
 saveMode = typeof saveMode != 'undefined' ? saveMode : true;
 // Convert to stream saveToFile parameter (1 = create; 2 = overwrite)   
 saveMode = saveMode = true ? 2 : 1;

 // Output object, a stream (to preserve output encoding set in XSL)
 var stream = WScript.createObject("ADODB.Stream");
 stream.open();
 stream.type = 1;

 // Transform and save to file
 xml.transformNodeToObject(xsl, stream);
 stream.saveToFile(saveXmlPath, saveMode);
 stream.close();
}

The xml and xsl parameters are DOMDocument objects with xml and xsl already loaded. For example, xsl can come from this function:

function getXsl(xslPath) {
 //Returns XSL loaded from xslPath supplied
 //Create DOM "xsl" for XSL, set DOM options, and load XSL file
 var xsl = new ActiveXObject("Msxml2.FreeThreadedDOMDocument.3.0");
 xsl.async = false;
 xsl.resolveExternals = false;
 xsl.validateOnParse = false;
 xsl.load(xslPath);
 //Return xsl
 return xsl;
}

Using this transform method, you can set XSL input parameters with a code like this:

function xslSetParam(xsl, paramName, paramValue) {
 // Sets parameter value in xsl (call before transform)
 // Requires XSL structure "xsl:stylesheet" (NOT "xsl:transform", and NOT "xslt:")
 // Select parameter to set
 var xslParam = xsl.selectSingleNode("/xsl:stylesheet/xsl:param[@name='" + paramName + "']");
 // Set parameter value
 xslParam.setAttribute("select", paramValue);
}

The output encoding you specify in your XSL will now be the output encoding of the file and specified in the XML declaration of the file as expected. So an output encoding like this in your XSL:

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

Will give you the desired output--like this:

<?xml version="1.0" encoding="UTF-8"?>


Grrrrr. Worked on this all day. If you create the object with no version number:

Server.CreateObject("MSXML2.FreeThreadedDOMDocument")

It will stick a <META http-equiv="Content-Type" content="text/html; charset=UTF-16"> in the header.

But if you specific a version number like:

Server.CreateObject("MSXML2.FreeThreadedDOMDocument.5.0")

or .4.0 or .6.0 (whatever's installed) it puts this in the header:

<META http-equiv="Content-Type" content="text/html">
0

精彩评论

暂无评论...
验证码 换一张
取 消