开发者

XSLT 1.0 - Merge sibling nodes with child nodes into new composite nodes

开发者 https://www.devze.com 2023-03-05 13:24 出处:网络
I had a tough time formulating the question title. Maybe the example will make more sense. Suppose I have an XML document that looks like this from system A:

I had a tough time formulating the question title. Maybe the example will make more sense.

Suppose I have an XML document that looks like this from system A:

<root>
    <phone_numbers>
        <phone_number type="work">123-WORK</phone_number>
        <phone_number type="home">456-HOME</phone_number>
        <phone_number type="work">789-WORK</phone_number>
        <phone_number type="other">012-OTHER</phone_number>
    </phone_numbers>
    <email_addresses>
        <email_address type="home">a@home</email_address>
        <email_address type="other">b@other</email_address>
        <email_address type="home">c@home</email_address>
        <email_address type="work">d@work</email_address>
        <email_address type="other">e@other</email_address>
        <email_address type="other">f@other</email_address>
    </email_addresses>
</root>

And I have to fit these into a structure like this so they can be used in system B:

<root>
    <addresses>
        <address name="work1">
            <phone_number>123-WORK</phone_number>
            <email_address>d@work</email_address>
        </address>
        <address name="work2">
            <phone_number>789-WORK</phone_number>
        </address>
        <address name="other1">
            <phone_number>012-OTHER</phone_number>
            <email_address>b@other</email_address>
        </address>
        <address name="other2">
            <email_address>e@other</email_address>
        </address>
        <address name="other3">
            <email_address>f@other</email_address>
        </address>
        <address name="home1">
            <phone_number>456-HOME</phone_number>
            <email_address>a@home</email_address>
        </address>
        <address name="home2">
            <email_address>c@home</email_address>
        </address>
    </addresses>
</root>

There can be any number (from 0 to infinity, as far as I know) of email addresses of each type. There can also be any number of phone numbers of each type, and the number of phone numbers of one type does not have to match the number of email addresses of the same type.

The email addresses and phone numbers in the first document aren't really related to each other, except that they are entered in the order they were added to system A.

I have to pair the emails and phone numbers up by type to fit into system B, and I would like to pair them so that the first phone number of type X is paired with the first email address of type X and so that no phone number of type X is paired with an email of a type other than X.

Since I have to pair them up, and since the order they were entered into the system is the closest I'll get to finding a relationship between the pairs, I would like to order them this way. I'll have to tell the users to go over the results, to make sure they make sense, but I have to pair them - no choice.

To complicate matters, my actual XML document has more nodes that I'll need to merge with phone_numbers and email_addresses, and I have more than two @types.

One other note: 开发者_StackOverflow中文版I'm already calculating the maximum number of nodes with any given @type, so with my example docs, I know that the maximum number of <address> nodes of a single @type is three (three <email_address> nodes with @type=other = three <address> nodes with @name=otherX).


This transformation is quite simpler (only 3 templates and no modes):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kTypeByVal" match="@type" use="."/>

 <xsl:key name="kPhNumByType" match="phone_number"
  use="@type"/>

 <xsl:key name="kAddrByType" match="email_address"
  use="@type"/>

 <xsl:variable name="vallTypes" select=
 "/*/*/*/@type
          [generate-id()
          =
           generate-id(key('kTypeByVal',.)[1])
          ]"/>

 <xsl:template match="/">
  <root>
   <addresses>
    <xsl:apply-templates select="$vallTypes"/>
   </addresses>
  </root>
 </xsl:template>

 <xsl:template match="@type">
  <xsl:variable name="vcurType" select="."/>
  <xsl:variable name="vPhoneNums" select="key('kPhNumByType',.)"/>
  <xsl:variable name="vAddresses" select="key('kAddrByType',.)"/>

  <xsl:variable name="vLonger" select=
  "$vPhoneNums[count($vPhoneNums) > count($vAddresses)]
  |
   $vAddresses[not(count($vPhoneNums) > count($vAddresses))]
  "/>

  <xsl:for-each select="$vLonger">
   <xsl:variable name="vPos" select="position()"/>
   <address name="{$vcurType}{$vPos}">
    <xsl:apply-templates select="$vPhoneNums[position()=$vPos]"/>
    <xsl:apply-templates select="$vAddresses[position()=$vPos]"/>
   </address>
  </xsl:for-each>
 </xsl:template>

 <xsl:template match="phone_number|email_address">
  <xsl:copy>
   <xsl:copy-of select="node()"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document (and any document with the described properties):

<root>
    <phone_numbers>
        <phone_number type="work">123-WORK</phone_number>
        <phone_number type="home">456-HOME</phone_number>
        <phone_number type="work">789-WORK</phone_number>
        <phone_number type="other">012-OTHER</phone_number>
    </phone_numbers>
    <email_addresses>
        <email_address type="home">a@home</email_address>
        <email_address type="other">b@other</email_address>
        <email_address type="home">c@home</email_address>
        <email_address type="work">d@work</email_address>
        <email_address type="other">e@other</email_address>
        <email_address type="other">f@other</email_address>
    </email_addresses>
</root>

the wanted, correct result is produced:

<root>
   <addresses>
      <address name="work1">
         <phone_number>123-WORK</phone_number>
         <email_address>d@work</email_address>
      </address>
      <address name="work2">
         <phone_number>789-WORK</phone_number>
      </address>
      <address name="home1">
         <phone_number>456-HOME</phone_number>
         <email_address>a@home</email_address>
      </address>
      <address name="home2">
         <email_address>c@home</email_address>
      </address>
      <address name="other1">
         <phone_number>012-OTHER</phone_number>
         <email_address>b@other</email_address>
      </address>
      <address name="other2">
         <email_address>e@other</email_address>
      </address>
      <address name="other3">
         <email_address>f@other</email_address>
      </address>
   </addresses>
</root>

Explanation:

  1. All different values of the type attribute are collected in the $vallTypes variable, using the Muenchian method for grouping.

  2. For every distinct value found in 1. above, an <address> element is output as follows.

  3. A name attribute is generated with value the concatenation of the current type and the current position().

  4. Two nodesets are captured in variables: one containing all phone_number elements that has this specific value of their type attribute, and another containing all email_address elements that has this specific value of their type attribute.

  5. For every element of the longer of these two node-sets one element or (if possible a pair of elements from the two node-sets) is/are used to be generated (omitting the type attribute`) in the final output.


This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="byType" match="/root/*/*" use="@type" />
    <xsl:key name="phoneByType" match="phone_numbers/phone_number"
        use="@type" />
    <xsl:key name="emailByType" match="email_addresses/email_address"
        use="@type" />
    <xsl:template match="/">
        <root>
            <addresses>
                <xsl:apply-templates />
            </addresses>
        </root>
    </xsl:template>
    <xsl:template match="/root/*/*" />
    <xsl:template
        match="/root/*/*[generate-id()=generate-id(key('byType', @type)[1])]">
        <xsl:apply-templates select="key('phoneByType', @type)"
            mode="wrap" />
        <xsl:apply-templates
            select="key('emailByType', @type)
                [position() > count(key('phoneByType', @type))]"
            mode="wrap" />
    </xsl:template>
    <xsl:template match="phone_numbers/phone_number" mode="wrap">
        <xsl:variable name="pos" select="position()" />
        <address name="{concat(@type, $pos)}">
            <xsl:apply-templates select="." mode="out" />
            <xsl:apply-templates select="key('emailByType', @type)[$pos]"
                mode="out" />
        </address>
    </xsl:template>
    <xsl:template match="email_addresses/email_address" mode="wrap">
        <address
            name="{concat(@type, 
                          position() + count(key('phoneByType', @type)))}">
            <xsl:apply-templates select="." mode="out" />
        </address>
    </xsl:template>
    <xsl:template match="/root/*/*" mode="out">
        <xsl:copy>
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

On this input:

<root>
    <phone_numbers>
        <phone_number type="work">123-WORK</phone_number>
        <phone_number type="home">456-HOME</phone_number>
        <phone_number type="work">789-WORK</phone_number>
        <phone_number type="other">012-OTHER</phone_number>
    </phone_numbers>
    <email_addresses>
        <email_address type="home">a@home</email_address>
        <email_address type="other">b@other</email_address>
        <email_address type="home">c@home</email_address>
        <email_address type="work">d@work</email_address>
        <email_address type="other">e@other</email_address>
        <email_address type="other">f@other</email_address>
        <email_address type="test">g@other</email_address>
    </email_addresses>
</root>

Produces:

<root>
    <addresses>
        <address name="work1">
            <phone_number>123-WORK</phone_number>
            <email_address>d@work</email_address>
        </address>
        <address name="work2">
            <phone_number>789-WORK</phone_number>
        </address>
        <address name="home1">
            <phone_number>456-HOME</phone_number>
            <email_address>a@home</email_address>
        </address>
        <address name="home2">
            <email_address>c@home</email_address>
        </address>
        <address name="other1">
            <phone_number>012-OTHER</phone_number>
            <email_address>b@other</email_address>
        </address>
        <address name="other2">
            <email_address>e@other</email_address>
        </address>
        <address name="other3">
            <email_address>f@other</email_address>
        </address>
        <address name="test1">
            <email_address>g@other</email_address>
        </address>
    </addresses>
</root>

Explanation:

  • There are three groups: 1) all contact info by type; 2) all phone numbers by type; 3) all email addresses by type
  • The first group is used to get the first occurrence of each type
  • Then we go through each of the phone numbers, pairing with any email address in the same position
  • Finally, we account for all of the email addresses that did not have a corresponding phone number
0

精彩评论

暂无评论...
验证码 换一张
取 消