I am searching for an XSLT or command-line tool (or C# code that can be made into a command-line tool, etc) for Windows that will do XML pretty-printing. Specifically, I want one that has the ability to put attributes one-to-a-line, something like:
<Node>
<ChildNode
value1='5'
value2='6'
value3='happy' />
</Node>
It doesn't have to be EXACTLY like that, but I want to use it for an XML file that has nodes 开发者_运维百科with dozens of attributes and spreading them across multiple lines makes them easier to read, edit, and text-diff.
NOTE: I think my preferred solution is an XSLT sheet I can pass through a C# method, though a Windows command-line tool is good too.
Here's a PowerShell script to do it. It takes the following input:
<?xml version="1.0" encoding="utf-8"?>
<Node>
<ChildNode value1="5" value2="6" value3="happy" />
</Node>
...and produces this as output:
<?xml version="1.0" encoding="utf-8"?>
<Node>
<ChildNode
value1="5"
value2="6"
value3="happy" />
</Node>
Here you go:
param(
[string] $inputFile = $(throw "Please enter an input file name"),
[string] $outputFile = $(throw "Please supply an output file name")
)
$data = [xml](Get-Content $inputFile)
$xws = new-object System.Xml.XmlWriterSettings
$xws.Indent = $true
$xws.IndentChars = " "
$xws.NewLineOnAttributes = $true
$data.Save([Xml.XmlWriter]::Create($outputFile, $xws))
Take that script, save it as C:\formatxml.ps1. Then, from a PowerShell prompt type the following:
C:\formatxml.ps1 C:\Path\To\UglyFile.xml C:\Path\To\NeatAndTidyFile.xml
This script is basically just using the .NET framework so you could very easily migrate this into a C# application.
NOTE: If you have not run scripts from PowerShell before, you will have to execute the following command at an elevated PowerShell prompt before you will be able to execute the script:
Set-ExecutionPolicy RemoteSigned
You only have to do this one time though.
I hope that's useful to you.
Here's a small C# sample, which can be used directly by your code, or built into an exe and called at the comand-line as "myexe from.xml to.xml
":
using System.Xml;
static void Main(string[] args)
{
XmlWriterSettings settings = new XmlWriterSettings {
NewLineHandling = NewLineHandling.Entitize,
NewLineOnAttributes = true, Indent = true, IndentChars = " ",
NewLineChars = Environment.NewLine
};
using (XmlReader reader = XmlReader.Create(args[0]))
using (XmlWriter writer = XmlWriter.Create(args[1], settings)) {
writer.WriteNode(reader, false);
writer.Close();
}
}
Sample input:
<Node><ChildNode value1='5' value2='6' value3='happy' /></Node>
Sample output (note you can remove the <?xml ...
with settings.OmitXmlDeclaration
):
<?xml version="1.0" encoding="utf-8"?>
<Node>
<ChildNode
value1="5"
value2="6"
value3="happy" />
</Node>
Note that if you want a string rather than write to a file, just swap with StringBuilder
:
StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(new StringReader(oldXml)))
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
writer.WriteNode(reader, false);
writer.Close();
}
string newXml = sb.ToString();
Try Tidy over on SourceForge. Although its often used on [X]HTML, I've used it successfully on XML before - just make sure you use the -xml
option.
http://tidy.sourceforge.net/#docs
Tidy reads HTML, XHTML and XML files and writes cleaned up markup. ... For generic XML files, Tidy is limited to correcting basic well-formedness errors and pretty printing.
People have ported to several platforms and it available as an executable and callable library.
Tidy has a heap of options including:
http://api.html-tidy.org/tidy/quickref_5.0.0.html#indent
indent-attributes
Top Type: Boolean
Default: no Example: y/n, yes/no, t/f, true/false, 1/0
This option specifies if Tidy should begin each attribute on a new line.
One caveat:
Limited support for XML
XML processors compliant with W3C's XML 1.0 recommendation are very picky about which files they will accept. Tidy can help you to fix errors that cause your XML files to be rejected. Tidy doesn't yet recognize all XML features though, e.g. it doesn't understand CDATA sections or DTD subsets.
But I suspect unless your XML is really advanced, the tool should work fine.
There is a tool, that can split attributes to one per line: xmlpp. It's a perl script, so you'll have to install perl. Usage:
perl xmlpp.pl -t input.xml
You can also determine the ordering of attributes by creating a file called attributeOrdering.txt, and calling perl xmlpp.pl -s -t input.xml
. For more options, use perl xmlpp.pl -h
I hope, it doesn't have too many bugs, but it has worked for me so far.
XML Notepad 2007 can do so manually ... let me see if it can be scripted.
Nope ... it can launch it like so:
XmlNotepad.exe a.xml
The rest is just clicking the save button. Power Shell, other tools can automate that.
Just use this xslt:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="ISO-8859-1"/>
<xsl:param name="indent-increment" select="' '"/>
<xsl:template name="newline">
<xsl:text disable-output-escaping="yes">
</xsl:text>
</xsl:template>
<xsl:template match="comment() | processing-instruction()">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:copy />
</xsl:template>
<xsl:template match="text()">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:value-of select="normalize-space(.)"/>
</xsl:template>
<xsl:template match="text()[normalize-space(.)='']"/>
<xsl:template match="*">
<xsl:param name="indent" select="''"/>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
<xsl:choose>
<xsl:when test="count(child::*) > 0">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="*|text()">
<xsl:with-param name="indent" select="concat ($indent, $indent-increment)"/>
</xsl:apply-templates>
<xsl:call-template name="newline"/>
<xsl:value-of select="$indent"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Or, as another option, here is a perl script: http://software.decisionsoft.com/index.html
You can implement a simple SAX application that will copy everything as is
and indent attributes how you like.
UPD:
SAX stands for Simple API for XML
. It is a push model of XML parsing (a classical example of Builder design pattern). The API is present in most of the current development platforms (though native .Net class library lacks one, having XMLReader intead)
Here is a raw implementation in python, it is rather cryptic but you can realize the main idea.
from sys import stdout
from xml.sax import parse
from xml.sax.handler import ContentHandler
from xml.sax.saxutils import escape
class MyHandler(ContentHandler):
def __init__(self, file_, encoding):
self.level = 0
self.elem_indent = ' '
# should the next block make a line break
self._allow_N = False
# whether the opening tag was closed with > (to allow />)
self._tag_open = False
self._file = file_
self._encoding = encoding
def _write(self, string_):
self._file.write(string_.encode(self._encoding))
def startElement(self, name, attrs):
if self._tag_open:
self._write('>')
self._tag_open = False
if self._allow_N:
self._write('\n')
indent = self.elem_indent * self.level
else:
indent = ''
self._write('%s<%s' % (indent, name))
# attr indent equals to the element indent plus ' '
attr_indent = self.elem_indent * self.level + ' '
for name in attrs.getNames():
# write indented attribute one per line
self._write('\n%s%s="%s"' % (attr_indent, name, escape(attrs.getValue(name))))
self._tag_open = True
self.level += 1
self._allow_N = True
def endElement(self, name):
self.level -= 1
if self._tag_open:
self._write(' />')
self._tag_open = False
return
if self._allow_N:
self._write('\n')
indent = self.elem_indent * self.level
else:
indent = ''
self._write('%s</%s>' % (indent, name))
self._allow_N = True
def characters(self, content):
if self._tag_open:
self._write('>')
self._tag_open = False
if content.strip():
self._allow_N = False
self._write(escape(content))
else:
self._allow_N = True
if __name__ == '__main__':
parser = parse('test.xsl', MyHandler(stdout, stdout.encoding))
精彩评论