开发者

Merge XML files with same structure and different data

开发者 https://www.devze.com 2022-12-14 14:59 出处:网络
I\'m trying to merge twofile that have the same structure, and some data in common. So if a node has the same name in both files, a new node should be created with the children of both original nodes.

I'm trying to merge two file that have the same structure, and some data in common. So if a node has the same name in both files, a new node should be created with the children of both original nodes. The original files are the following:

file1.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
        <CUSTOMER ID='M1'/>
        <CUSTOMER ID='M2'/>
        <CUSTOMER ID='M3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
        <CUSTOMER ID='M4'/>
        <CUSTOMER ID='M5'/>
        <CUSTOMER ID='M6'/>
    </SECURITY>
</BROADRIDGE>

file2.xml
<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
    <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
        <CUSTOMER ID='B1'/>
        <C开发者_如何学JAVAUSTOMER ID='B2'/>
        <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
        <CUSTOMER ID='B4'/>
        <CUSTOMER ID='B5'/>
        <CUSTOMER ID='B6'/>
    </SECURITY>
</BROADRIDGE>

The idea is to create a new XML file with the same structure that contains the information from both files, merging those SECURITY nodes that have the same CUSIP attribute. In this case the result should be the following:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
        <CUSTOMER ID="M1"/>
        <CUSTOMER ID="M2"/>
        <CUSTOMER ID="M3"/>
        <CUSTOMER ID='B1'/>
        <CUSTOMER ID='B2'/>
        <CUSTOMER ID='B3'/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
        <CUSTOMER ID="M4"/>
        <CUSTOMER ID="M5"/>
        <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
        <CUSTOMER ID="B4"/>
        <CUSTOMER ID="B5"/>
        <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

I've defined the folling xml to joing them:

<?xml version="1.0"?>                                  
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
</MASTERFILE>

And the following XSL to do the merge:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/MASTERFILE">
        <BROADRIDGE>
            <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
            <xsl:for-each select="$securities">
                <xsl:if test="generate-id(.) = generate-id($securities[@CUSIP=current()/@CUSIP])">
                    <SECURITY>
                        <xsl:attribute name="CUSIP" ><xsl:value-of select="@CUSIP"/></xsl:attribute>
                        <xsl:for-each select="CUSTOMER">
                            <CUSTOMER>
                                <xsl:attribute name="ID" ><xsl:value-of select="@ID"/></xsl:attribute>
                            </CUSTOMER>
                        </xsl:for-each>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

But I'm getting the following:

<?xml version="1.0" encoding="UTF-8"?>
<BROADRIDGE>
    <SECURITY CUSIP="CUSIP1">
        <CUSTOMER ID="M1"/>
        <CUSTOMER ID="M2"/>
        <CUSTOMER ID="M3"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP3">
        <CUSTOMER ID="M4"/>
        <CUSTOMER ID="M5"/>
        <CUSTOMER ID="M6"/>
    </SECURITY>
    <SECURITY CUSIP="CUSIP2">
        <CUSTOMER ID="B4"/>
        <CUSTOMER ID="B5"/>
        <CUSTOMER ID="B6"/>
    </SECURITY>
</BROADRIDGE>

Any idea why it's not merging the CUSTOMERS from both file for SECURITY with CUSIP = CUSIP1?


(See my comment on the "one-way-merge" on the OP.) Here's my (very inefficient) solution to the merge problem:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="set1" select="document('file1.xml')/BROADRIDGE/SECURITY"/>
    <xsl:variable name="set2" select="document('file2.xml')/BROADRIDGE/SECURITY"/>

    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$set1 | $set2">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$set1[@CUSIP = $cusip]/*"/>
                        <xsl:copy-of select="$set2[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>
</xsl:stylesheet>

Note that the stylesheet does not suppose any particular document - it simply loads the two files as variables. One can improve th xslt design by parameterizing the urls for the to be loaded XML documents

To apply the merge to multiple documents, you can create a file, say master.xml that lists all the files to process like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="merge.xslt"?>
<files>
  <file>file1.xml</file>
  <file>file2.xml</file>
  ...
  <file>fileN.xml</file>    
</files>

In file1.xml, I have this:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='M1'/>
    <CUSTOMER ID='M2'/>
    <CUSTOMER ID='M3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP3' DESCRIPT='CUSIP3'>
    <CUSTOMER ID='M4'/>
    <CUSTOMER ID='M5'/>
    <CUSTOMER ID='M6'/>
  </SECURITY>
</BROADRIDGE>

In file2.xml, I have this:

<?xml version='1.0' encoding='UTF-8'?>
<BROADRIDGE>
  <SECURITY CUSIP='CUSIP1' DESCRIPT='CUSIP1'>
    <CUSTOMER ID='B1'/>
    <CUSTOMER ID='B2'/>
    <CUSTOMER ID='B3'/>
  </SECURITY>
  <SECURITY CUSIP='CUSIP2' DESCRIPT='CUSIP2'>
    <CUSTOMER ID='B4'/>
    <CUSTOMER ID='B5'/>
    <CUSTOMER ID='B6'/>
  </SECURITY>
</BROADRIDGE>

the merge.xslt is a modified version of the earlier one, which is now capable of processing a variable number of files (the files listed in master.xml):

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
  <xsl:call-template name="merge-files"/>
</xsl:template>

<!-- loop through file names, load documents -->
<xsl:template name="merge-files">
  <xsl:param name="files" select="/files/file/text()"/>
  <xsl:param name="num-files" select="count($files)"/>
  <xsl:param name="curr-file" select="0"/>
  <xsl:param name="set" select="/*[0]"/>
  <xsl:choose> <!-- if we still have files, concat them to $set -->
    <xsl:when test="$curr-file &lt; $num-files">
      <xsl:call-template name="merge-files">
        <xsl:with-param name="files" select="$files"/>
        <xsl:with-param name="num-files" select="$num-files"/>
        <xsl:with-param name="curr-file" select="$curr-file + 1"/>
        <xsl:with-param name="set" select="$set | document($files[$curr-file+1])/BROADRIDGE/SECURITY"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise> <!-- no more files, start merging. -->
      <xsl:call-template name="merge">
        <xsl:with-param name="nodes" select="$set"/>
      </xsl:call-template>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!-- perform the actual merge -->
<xsl:template name="merge">
  <xsl:param name="nodes"/>
  <BROADRIDGE>
    <xsl:for-each select="$nodes"> <!-- look at all possible nodes to merge -->
      <xsl:variable name="position" select="position()"/>
      <xsl:variable name="cusip" select="@CUSIP"/>

      <!-- when we encounter this id for the 1st time -->
      <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0"> 
        <SECURITY>
          <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
          <!-- copy all node data related to this cusip here -->
          <xsl:for-each select="$nodes[@CUSIP = $cusip]">
            <xsl:copy-of select="*"/>
          </xsl:for-each>
        </SECURITY>
      </xsl:if>
    </xsl:for-each>
  </BROADRIDGE>
</xsl:template>

</xsl:stylesheet>

Running this gives me this output:

<BROADRIDGE>
  <SECURITY CUSIP="CUSIP1">
    <CUSTOMER ID="M1"/>
    <CUSTOMER ID="M2"/>
    <CUSTOMER ID="M3"/>
    <CUSTOMER ID="B1"/>
    <CUSTOMER ID="B2"/>
    <CUSTOMER ID="B3"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP3">
    <CUSTOMER ID="M4"/>
    <CUSTOMER ID="M5"/>
    <CUSTOMER ID="M6"/>
  </SECURITY>
  <SECURITY CUSIP="CUSIP2">
    <CUSTOMER ID="B4"/>
    <CUSTOMER ID="B5"/>
    <CUSTOMER ID="B6"/>
  </SECURITY>
</BROADRIDGE>


The generate-id() function is guaranteed to be different for every node that participates in a given transformation. As your calling it on differnt documents, they will not be the same

You should compare the string values of the CUSIPS in the documents rather than their ID's.

If you can use xslt 2.0 (which is a lot better than 1), this will work

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output indent="yes"/>
        <xsl:template match="/MASTERFILE">
                <BROADRIDGE>
                        <xsl:variable name="securities" select="document(FILE)/BROADRIDGE/SECURITY"/>
                        <xsl:for-each select="distinct-values($securities/@CUSIP)">
                                <SECURITY>
                                        <xsl:attribute name="CUSIP">
                                                <xsl:value-of select="."/>
                                        </xsl:attribute>

                                        <xsl:for-each select="distinct-values($securities[@CUSIP = 'CUSIP1']/CUSTOMER/@ID)">
                                                <CUSTOMER>
                                                  <xsl:attribute name="ID">
                                                  <xsl:value-of select="."/>
                                                  </xsl:attribute>
                                                </CUSTOMER>
                                        </xsl:for-each>
                                </SECURITY>
                        </xsl:for-each>
                </BROADRIDGE>
        </xsl:template>
</xsl:stylesheet>


Either you're making this much too complicated, or there are other aspects of this problem that you haven't mentioned:

<xsl:variable name="file1" select="document(/MASTERFILE/FILE[1])"/>
<xsl:variable name="file2" select="document(/MASTERFILE/FILE[2])"/>

<xsl:template match="/">
   <BROADRIDGE>
      <xsl:apply-templates select="$file1/BROADRIDGE/SECURITY"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[not(@CUISP=$file1/BROADRIDGE/SECURITY/@CUISP)]"/>
   </BROADRIDGE>
</xsl:template>

<xsl:template match="SECURITY">
   <SECURITY>
      <xsl:copy-of select="*"/>
      <xsl:copy-of select="$file2/BROADRIDGE/SECURITY[@CUSIP=current()/@CUSIP]/*"/>
   </SECURITY>
</xsl:template>


Roland, thanks for your examples. Based on the first code you sent, I developed the following template:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:variable name="nodes" select="document(/MASTERFILE/FILE)/BROADRIDGE/SECURITY"/>
    <xsl:template match="/">
        <BROADRIDGE>
            <!-- walk over all relevant nodes -->
            <xsl:for-each select="$nodes">
                <xsl:variable name="position" select="position()"/>
                <xsl:variable name="cusip" select="@CUSIP"/>
                <!-- if we see this CUSIP for the first time, --> 
                <xsl:if test="count($nodes[position() &lt; $position][@CUSIP = $cusip])=0">
                    <SECURITY>                            
                        <xsl:attribute name="CUSIP"><xsl:value-of select="$cusip"/></xsl:attribute>
                        <xsl:attribute name="DESCRIPT"><xsl:value-of select="@DESCRIPT"/></xsl:attribute>
                        <!-- copy nodes from both sets with matching attribute -->
                        <xsl:copy-of select="$nodes[@CUSIP = $cusip]/*"/>
                    </SECURITY>
                </xsl:if>
            </xsl:for-each>
        </BROADRIDGE>
    </xsl:template>

I just give to the document function the list of files, so it creates a node set with all the SECURITY nodes from all the files. When I apply it to the following xml

<?xml version="1.0"?>
<MASTERFILE>
   <FILE>\file1.xml</FILE>
   <FILE>\file2.xml</FILE>
   <FILE>\file3.xml</FILE>
</MASTERFILE>

It works perfectly. Thank you for your samples

0

精彩评论

暂无评论...
验证码 换一张
取 消