Does anybody know of a tool that will compare two XML documents. Belay that mocking… there’s more. I need something that will make sure each node in file 1 is also in file 2 regardless of order. I thought XML Spy would do it with the Ignore Order of Child Nodes option but it didn’t. The following would be considered the same:
<Node>
<Child name="Alpha"/>
<Child name="Beta"/>
<Child name="Charlie"/>
</Node>
<Node>
<Child name="Beta"/>
<Child name="Charlie"/>开发者_如何学C
<Child name="Alpha"/>
</Node>
I wrote a simple python tool for this called xmldiffs
:
Compare two XML files, ignoring element and attribute order.
Usage:
xmldiffs [OPTION] FILE1 FILE2
Any extra options are passed to the
diff
command.
Get it at https://github.com/joh/xmldiffs
With Beyond Compare you can use in the File Formats
-Settings the XML Sort
Conversion. With this option the XML children will be sorted before the diff.
A trial / portable version of Beyond Compare is available.
You might want to google for "XML diff tool", which will give you more than adequate results. One of them is OxygenXml, a tool I frequently use. You can also try Microsofts XML Diff and Patch Tool.
Good Luck.
I'd use XMLUnit for this as it can cater for elements being in a different order.
I had a similar need this evening, and couldn't find something that fit my requirements.
My workaround was to sort the two XML files I wanted to diff, sorting alphabetically by the element name. Once they were both in a consistent order, I could diff the two sorted files using a regular visual diff tool.
If this approach sounds useful to anyone else, I've shared the python script I wrote to do the sorting at http://dalelane.co.uk/blog/?p=3225
i recently gave a similar answer here (Open source command line tool for Linux to diff XML files ignoring element order), but i'll provide more detail...
if you write a program to walk the two trees together, you can customize the logic for identifying "matches" between the trees, and also for handling nodes that don't match. here is an example in xslt 2.0 (sorry it's so long):
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:set="http://exslt.org/sets"
xmlns:primary="primary"
xmlns:control="control"
xmlns:util="util"
exclude-result-prefixes="xsl xs set primary control">
<!-- xml diff tool
import this stylesheet from another and call the "compare" template with two args:
primary: the root of the primary tree to submit to comparison
control: the root of the control tree to compare against
the two trees will be walked together. the primary tree will be walked in document order, matching elements
and attributes from the control tree along the way, building a tree of common content, with appendages
containing primary and control only content. that tree will then be used to generate the diff.
the process of matching involves finding, for an element or attribute in the primary tree, the
equivalent element or attribute in the control tree, *at the same level*, and *regardless of ordering*.
matching logic is encoded as templates with mode="find-match", providing a hook to wire in specific
matching logic for particular elements or attributes. for example, an element may "match" based on an
@id attribute value, irrespective of element ordering; encode this in a mode="find-match" template.
the treatment of diffs is encoded as templates with mode="primary-only" and "control-only", providing
hooks for alternate behavior upon encountering differences.
-->
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:param name="full" select="false()"/><!-- whether to render the full doc, as opposed to just diffs -->
<xsl:template match="/">
<xsl:call-template name="compare">
<xsl:with-param name="primary" select="*/*[1]"/><!-- first child of root element, for example -->
<xsl:with-param name="control" select="*/*[2]"/><!-- second child of root element, for example -->
</xsl:call-template>
</xsl:template>
<!-- OVERRIDES: templates that can be overridden to provide targeted matching logic and diff treatment -->
<!-- default find-match template for elements
(by default, for "complex" elements, name has to match, for "simple" elements, name and value do)
for context node (from "primary"), choose from among $candidates (from "control") which one matches
(override with more specific match patterns to effect alternate behavior for targeted elements)
-->
<xsl:template match="*" mode="find-match" as="element()?">
<xsl:param name="candidates" as="element()*"/>
<xsl:choose>
<xsl:when test="text() and count(node()) = 1"><!-- simple content -->
<xsl:sequence select="$candidates[node-name(.) = node-name(current())][text() and count(node()) = 1][. = current()][1]"/>
</xsl:when>
<xsl:when test="not(node())"><!-- empty -->
<xsl:sequence select="$candidates[node-name(.) = node-name(current())][not(node())][1]"/>
</xsl:when>
<xsl:otherwise><!-- presumably complex content -->
<xsl:sequence select="$candidates[node-name(.) = node-name(current())][1]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- default find-match template for attributes
(by default, name and value have to match)
for context attr (from "primary"), choose from among $candidates (from "control") which one matches
(override with more specific match patterns to effect alternate behavior for targeted attributes)
-->
<xsl:template match="@*" mode="find-match" as="attribute()?">
<xsl:param name="candidates" as="attribute()*"/>
<xsl:sequence select="$candidates[. = current()][node-name(.) = node-name(current())][1]"/>
</xsl:template>
<!-- default primary-only template (override with more specific match patterns to effect alternate behavior) -->
<xsl:template match="@* | *" mode="primary-only">
<xsl:apply-templates select="." mode="illegal-primary-only"/>
</xsl:template>
<!-- write out a primary-only diff -->
<xsl:template match="@* | *" mode="illegal-primary-only">
<primary:only>
<xsl:copy-of select="."/>
</primary:only>
</xsl:template>
<!-- default control-only template (override with more specific match patterns to effect alternate behavior) -->
<xsl:template match="@* | *" mode="control-only">
<xsl:apply-templates select="." mode="illegal-control-only"/>
</xsl:template>
<!-- write out a control-only diff -->
<xsl:template match="@* | *" mode="illegal-control-only">
<control:only>
<xsl:copy-of select="."/>
</control:only>
</xsl:template>
<!-- end OVERRIDES -->
<!-- MACHINERY: for walking the primary and control trees together, finding matches and recursing -->
<!-- compare "primary" and "control" trees (this is the root of comparison, so CALL THIS ONE !) -->
<xsl:template name="compare">
<xsl:param name="primary"/>
<xsl:param name="control"/>
<!-- write the xml diff into a variable -->
<xsl:variable name="diff">
<xsl:call-template name="match-children">
<xsl:with-param name="primary" select="$primary"/>
<xsl:with-param name="control" select="$control"/>
</xsl:call-template>
</xsl:variable>
<!-- "print" the xml diff as textual output -->
<xsl:apply-templates select="$diff" mode="print">
<xsl:with-param name="render" select="$full"/>
</xsl:apply-templates>
</xsl:template>
<!-- assume primary (context) element and control element match, so render the "common" element and recurse -->
<xsl:template match="*" mode="common">
<xsl:param name="control"/>
<xsl:copy>
<xsl:call-template name="match-attributes">
<xsl:with-param name="primary" select="@*"/>
<xsl:with-param name="control" select="$control/@*"/>
</xsl:call-template>
<xsl:choose>
<xsl:when test="text() and count(node()) = 1">
<xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="match-children">
<xsl:with-param name="primary" select="*"/>
<xsl:with-param name="control" select="$control/*"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:copy>
</xsl:template>
<!-- find matches between collections of attributes in primary vs control -->
<xsl:template name="match-attributes">
<xsl:param name="primary" as="attribute()*"/>
<xsl:param name="control" as="attribute()*"/>
<xsl:param name="primaryCollecting" as="attribute()*"/>
<xsl:choose>
<xsl:when test="$primary and $control">
<xsl:variable name="this" select="$primary[1]"/>
<xsl:variable name="match" as="attribute()?">
<xsl:apply-templates select="$this" mode="find-match">
<xsl:with-param name="candidates" select="$control"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:choose>
<xsl:when test="$match">
<xsl:copy-of select="$this"/>
<xsl:call-template name="match-attributes">
<xsl:with-param name="primary" select="subsequence($primary, 2)"/>
<xsl:with-param name="control" select="remove($control, 1 + count(set:leading($control, $match)))"/>
<xsl:with-param name="primaryCollecting" select="$primaryCollecting"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="match-attributes">
<xsl:with-param name="primary" select="subsequence($primary, 2)"/>
<xsl:with-param name="control" select="$control"/>
<xsl:with-param name="primaryCollecting" select="$primaryCollecting | $this"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:if test="$primaryCollecting | $primary">
<xsl:apply-templates select="$primaryCollecting | $primary" mode="primary-only"/>
</xsl:if>
<xsl:if test="$control">
<xsl:apply-templates select="$control" mode="control-only"/>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- find matches between collections of elements in primary vs control -->
<xsl:template name="match-children">
<xsl:param name="primary" as="node()*"/>
<xsl:param name="control" as="element()*"/>
<xsl:variable name="this" select="$primary[1]" as="node()?"/>
<xsl:choose>
<xsl:when test="$primary and $control">
<xsl:variable name="match" as="element()?">
<xsl:apply-templates select="$this" mode="find-match">
<xsl:with-param name="candidates" select="$control"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:choose>
<xsl:when test="$match">
<xsl:apply-templates select="$this" mode="common">
<xsl:with-param name="control" select="$match"/>
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="$this" mode="primary-only"/>
</xsl:otherwise>
</xsl:choose>
<xsl:call-template name="match-children">
<xsl:with-param name="primary" select="subsequence($primary, 2)"/>
<xsl:with-param name="control" select="if (not($match)) then $control else remove($control, 1 + count(set:leading($control, $match)))"/>
</xsl:call-template>
</xsl:when>
<xsl:when test="$primary">
<xsl:apply-templates select="$primary" mode="primary-only"/>
</xsl:when>
<xsl:when test="$control">
<xsl:apply-templates select="$control" mode="control-only"/>
</xsl:when>
</xsl:choose>
</xsl:template>
<!-- end MACHINERY -->
<!-- PRINTERS: print templates for writing out the diff -->
<xsl:template match="*" mode="print">
<xsl:param name="depth" select="-1"/>
<xsl:param name="render" select="false()"/>
<xsl:param name="lineLeader" select="' '"/>
<xsl:param name="rest" as="element()*"/>
<xsl:if test="$render or descendant::primary:* or descendant::control:*">
<xsl:call-template name="whitespace">
<xsl:with-param name="indent" select="$depth"/>
<xsl:with-param name="leadChar" select="$lineLeader"/>
</xsl:call-template>
<xsl:text><</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:apply-templates select="@* | primary:*[@*] | control:*[@*]" mode="print">
<xsl:with-param name="depth" select="$depth"/>
<xsl:with-param name="render" select="$render"/>
<xsl:with-param name="lineLeader" select="$lineLeader"/>
</xsl:apply-templates>
<xsl:choose>
<xsl:when test="text() and count(node()) = 1"><!-- field element (just textual content) -->
<xsl:text>></xsl:text>
<xsl:value-of select="."/>
<xsl:text></</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:text>></xsl:text>
</xsl:when>
<xsl:when test="count(node()) = 0"><!-- empty (self-closing) element -->
<xsl:text>/></xsl:text>
</xsl:when>
<xsl:otherwise><!-- complex content -->
<xsl:text>> </xsl:text>
<xsl:apply-templates select="*[not(self::primary:* and @*) and not(self::control:* and @*)]" mode="print">
<xsl:with-param name="depth" select="$depth + 1"/>
<xsl:with-param name="render" select="$render"/>
<xsl:with-param name="lineLeader" select="$lineLeader"/>
</xsl:apply-templates>
<xsl:call-template name="whitespace">
<xsl:with-param name="indent" select="$depth"/>
<xsl:with-param name="leadChar" select="$lineLeader"/>
</xsl:call-template>
<xsl:text></</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:text>></xsl:text>
</xsl:otherwise>
</xsl:choose>
<xsl:text> </xsl:text>
</xsl:if>
<!-- write the rest of the elements, if any -->
<xsl:apply-templates select="$rest" mode="print">
<xsl:with-param name="depth" select="$depth"/>
<xsl:with-param name="render" select="$render"/>
<xsl:with-param name="lineLeader" select="$lineLeader"/>
<xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
</xsl:apply-templates>
</xsl:template>
<xsl:template match="@*" mode="print">
<xsl:param name="depth" select="0"/>
<xsl:param name="render" select="false()"/>
<xsl:param name="lineLeader" select="' '"/>
<xsl:param name="rest" as="attribute()*"/>
<xsl:if test="$render">
<xsl:text> </xsl:text>
<xsl:call-template name="whitespace">
<xsl:with-param name="indent" select="$depth + 3"/>
<xsl:with-param name="leadChar" select="$lineLeader"/>
</xsl:call-template>
<xsl:value-of select="name(.)"/>
<xsl:text>="</xsl:text>
<xsl:value-of select="."/>
<xsl:text>"</xsl:text>
</xsl:if>
<xsl:apply-templates select="$rest" mode="print">
<xsl:with-param name="depth" select="$depth"/>
<xsl:with-param name="render" select="$render"/>
<xsl:with-param name="lineLeader" select="$lineLeader"/>
<xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
</xsl:apply-templates>
</xsl:template>
<xsl:template match="primary:* | control:*" mode="print">
<xsl:param name="depth"/>
<xsl:variable name="diffType" select="util:diff-type(.)"/>
<xsl:variable name="primary" select="self::primary:*"/>
<xsl:variable name="lineLeader" select="if ($primary) then '+' else '-'"/>
<!-- only if this is the first in a sequence of control::* elements, since the rest are handled along with the first... -->
<xsl:if test="util:diff-type(preceding-sibling::*[1]) != $diffType">
<xsl:if test="@*">
<xsl:text> </xsl:text>
</xsl:if>
<xsl:call-template name="diffspace">
<xsl:with-param name="indent" select="if (@*) then $depth + 3 else $depth"/>
<xsl:with-param name="primary" select="$primary"/>
</xsl:call-template>
<b><i><!-- ... --></i></b><!-- something to identify diff sections in output -->
<xsl:if test="node()">
<xsl:text> </xsl:text>
</xsl:if>
<xsl:variable name="rest" select="set:leading(following-sibling::*, following-sibling::*[util:diff-type(.) != $diffType])"/>
<xsl:apply-templates select="@* | node()" mode="print">
<xsl:with-param name="depth" select="$depth"/>
<xsl:with-param name="render" select="true()"/>
<xsl:with-param name="lineLeader" select="$lineLeader"/>
<xsl:with-param name="rest" select="$rest/@* | $rest/*"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
<xsl:template name="whitespace">
<xsl:param name="indent" select="0" as="xs:integer"/>
<xsl:param name="leadChar" select="' '"/>
<xsl:choose>
<xsl:when test="$indent > 0">
<xsl:value-of select="$leadChar"/>
<xsl:text> </xsl:text>
<xsl:for-each select="0 to $indent - 1">
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:for-each select="0 to $indent">
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="diffspace">
<xsl:param name="indent" select="0" as="xs:integer"/>
<xsl:param name="primary" select="false()"/>
<xsl:for-each select="0 to $indent">
<xsl:choose>
<xsl:when test="$primary">
<xsl:text>++</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>--</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
<!-- just an "enum" for deciding whether to group adjacent diffs -->
<xsl:function name="util:diff-type" as="xs:integer">
<xsl:param name="construct"/>
<xsl:sequence select="if ($construct/self::primary:*[@*]) then 1 else
if ($construct/self::control:*[@*]) then 2 else
if ($construct/self::primary:*) then 3 else
if ($construct/self::control:*) then 4 else
if ($construct) then 5 else 0"/>
</xsl:function>
<!-- end PRINTERS -->
</xsl:stylesheet>
consider this example input, based on yours:
<test>
<Node>
<Child name="Alpha"/>
<Child name="Beta"/>
<Child name="Charlie"/>
</Node>
<Node>
<Child name="Beta"/>
<Child name="Charlie"/>
<Child name="Alpha"/>
</Node>
</test>
with the stylesheet as is, the following is the output when applied to the example:
<Node>
<Child
++++++++<!-- ... -->
+ name="Alpha"
--------<!-- ... -->
- name="Beta">
</Child>
<Child
++++++++<!-- ... -->
+ name="Beta"
--------<!-- ... -->
- name="Charlie">
</Child>
<Child
++++++++<!-- ... -->
+ name="Charlie"
--------<!-- ... -->
- name="Alpha">
</Child>
</Node>
but, if you add this custom template:
<xsl:template match="Child" mode="find-match" as="element()?">
<xsl:param name="candidates" as="element()*"/>
<xsl:sequence select="$candidates[@name = current()/@name][1]"/>
</xsl:template>
which says to match a Child
element based on its @name
attribute, then you get no output (meaning there is no diff).
Here is a diff solution using SWI-Prolog
:- use_module(library(xpath)).
load_trees(XmlRoot1, XmlRoot2) :-
load_xml('./xml_source_1.xml', XmlRoot1, _),
load_xml('./xml_source_2.xml', XmlRoot2, _).
find_differences(Reference, Root1, Root2) :-
xpath(Root1, //'Child'(@name=Name), Node),
not(xpath(Root2, //'Child'(@name=Name), Node)),
writeln([Reference, Name, Node]).
diff :-
load_trees(Root1, Root2),
(find_differences('1', Root1, Root2) ; find_differences('2', Root2, Root1)).
Prolog will unify the Name variable to match nodes from file 1 and file 2. The unification on the Node variable does the "diff" detection.
Here's some sample output below:
% file 1 and file 2 have no differences
?- diff.
false.
% "Alpha" was updated in file 2
?- diff.
[1,Alpha,element(Child,[name=Alpha],[])]
[2,Alpha,element(Child,[name=Alpha,age=7],[])]
false.
With C# You could do this and afterwards compare it with any diff tool.
public void Run()
{
LoadSortAndSave(@".. first file ..");
LoadSortAndSave(@".. second file ..");
}
public void LoadSortAndSave(String path)
{
var xdoc = XDocument.Load(path);
SortXml(xdoc.Root);
File.WriteAllText(path + ".sorted", xdoc.ToString());
}
private void SortXml(XContainer parent)
{
var elements = parent.Elements()
.OrderBy(e => e.Name.LocalName)
.ToArray();
Array.ForEach(elements, e => e.Remove());
foreach (var element in elements)
{
parent.Add(element);
SortXml(element);
}
}
Wrote a simple java program to do so. Stored two XML's being compared in a HashMap, with key as XPath of element(including text value of element) and value as number of occurrences of that element. then compared two HashMap's for both keyset and values.
/**
* creates a map of elements with text values and no nested nodes.
* Here Key of the map is XPATH of element concatenated with the text value of element, value of the element is number of occurrences of that element.
*
* @param xmlContent
* @return
* @throws ParserConfigurationException
* @throws SAXException
* @throws IOException
*/
private static Map<String, Long> getMapOfElementsOfXML(String xmlContent)
throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc1 = db.parse(new ByteArrayInputStream(xmlContent.getBytes()));
NodeList entries = doc1.getElementsByTagName("*");
Map<String, Long> mapElements = new HashMap<>();
for (int i = 0; i < entries.getLength(); i++) {
Element element = (Element) entries.item(i);
if (element.getChildNodes().getLength() == 1&&element.getTextContent()!=null) {
final String elementWithXPathAndValue = getXPath(element.getParentNode())
+ "/"
+ element.getParentNode().getNodeName()
+ "/"
+ element.getTagName()
+ "/"
+ element.getTextContent();
Long countValue = mapElements.get(elementWithXPathAndValue);
if (countValue == null) {
countValue = Long.valueOf(0l);
} else {
++countValue;
}
mapElements.put(elementWithXPathAndValue, countValue);
}
}
return mapElements;
}
static String getXPath(Node node) {
Node parent = node.getParentNode();
if (parent == null) {
return "";
}
return getXPath(parent) + "/" + parent.getNodeName();
}
Complete program is here https://comparetwoxmlsignoringstanzaordering.blogspot.com/2018/12/java-program-to-compare-two-xmls.html
You can use the 'pom sorter' plugin in Idea Intellij and use Intellij's own 'Compare Files' tool.
Marketplace link for the pom sorter plugin: https://plugins.jetbrains.com/plugin/7084-pom-sorter
/**
* @author sdiallo
* @since 2017-01-16
* <p>
* Compare the content of two XML file
* </p>
* <ul>
* <li>Ignore the white space</li>
* <li>Ignore the attribute order</li>
* <li>Ignore the comment</li>
* <li>Ignore Sequence child nodes are not the same</li>
* <ul>
*
* @param String XML
* first Content to be compared
* @param String XML
* second Content to be compared
* @return List the differences computed between the two files
* <ul>
* <li>null means the files are equal</li>
* <li>elsewhere the files are different</li>
* <ul>
* */
public static List buildDiffXMLs(String xmlExpected, String xmlGenerated) {
List<?> differencesList = null;
XMLUnit.setIgnoreAttributeOrder(true);
XMLUnit.setIgnoreComments(true);
XMLUnit.setIgnoreWhitespace(true);
try {
DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(
xmlExpected, xmlGenerated));
// Two documents are considered to be "similar" if they contain the
// same elements and attributes regardless of order.
if ( !diff.identical() && !diff.similar()) {
differencesList = diff.getAllDifferences();
}// end if
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return differencesList;
}// buildDiffXMLs
As a (very) quick and dirty approach, I've done this in a pinch:
- Open Excel
- Paste file 1 into column A, one line per row. Name the range "FILE1"
- Paste file 2 into column B, one line per row. Name the range "FILE2"
In C1, enter the formula:
=IF(ISERROR(VLOOKUP(B1,FILE1,1,FALSE)),"DIFF","")
In D1, enter the forumula:
=IF(ISERROR(VLOOKUP(A1,FILE2,1,FALSE)),"DIFF","")
- Fill down columns C and D to the bottom of the files.
That will highlight any rows which appear in one file but not the other file. It's not tidy by any stretch, but sometimes you just have to work with what you've got.
The simple way to do so is to use versioning tool like tortoise git.
- Create a github account
- Create a git repository in your git account
- Checkout that repository
- Add the other side of the file to be compared
- Push the content to the server
- Change the source with the remain side
- Compare your content as any source file
精彩评论