开发者

Sorting XML element order using XSLT based-on order specified in external doc

开发者 https://www.devze.com 2023-01-30 17:05 出处:网络
Platform: Saxon 9 - XSLT 2.0 I have 3000 xml docs that need to be regularly edited, updated and saved.

Platform: Saxon 9 - XSLT 2.0

I have 3000 xml docs that need to be regularly edited, updated and saved.

Part of the process involves checking-out a document from a repository before editing, and publishing it at regular intervals when editing is complete.

Each document contains a series individually named sections e.g.

   <part>
        <meta>
            <place_id>12345</place_id>
            <place_name>London</place_name>
            <country_id>GB</country_id>
            <country_name>United Kingdom</country_name>
        </meta>
        <text>
            <docs>some blurb</docs>
            <airport>some blurb LGW LHR</airport>
            <trains>some blurb</trains>
            <hotels>some blurb</hotels>
            <health>some blurb</health>
            <attractions>some blurb</attractions>
        </text>
   </part>

Within the text element there are nearly 100 sections, and as with all editorial teams, they change their mind on the preferred order on an occasional, but regular, basis. Maybe twice per year.

At the moment, we present the XML doc sections to the editors IN THE CURRENT PREFERRED ORDER for editing and for publishing. This order is specified in a dynamically generated external document called 'stdhdg.xml', and appears something like this:

<hdgs>
    <hdg name="docs" newsort="10"/>
    <hdg name="airport" newsort="30"/>
    <hdg name="trains" newsort="20"/>
    <hdg name="hotels" newsort="40"/>
    <开发者_运维问答;hdg name="health" newsort="60"/>
    <hdg name="attractions" newsort="50"/>
</hdgs>

where the preferred sort-order is specified by hdg/@newsort.

So I use a template like this to process in the correct order

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
<xsl:variable name="stdhead" select="document('stdhdg.xml')"/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:variable name="tagname" select="@name"/>
            <xsl:variable name="thisnode" select="$thetext/*[local-name() = $tagname]"/>
            <xsl:apply-templates select="$thisnode"/>
        </xsl:for-each>
    </text>
</xsl:template>

But it seems very slow and cumbersome and I feel that I should be using keys to speed it up.

Is there a simpler/neater way of doing this sorting operation.

(Please don't ask me to change the way the editors edit. That is more than my life's worth)

TIA

Feargal


Yes, keys should speed up such a lookup. Here is an outline:

<xsl:stylesheet ...>

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  ...

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$stdhead//hdg">
            <xsl:sort data-type="number" order="ascending" select="@newsort"/>
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>

All typed directly in the browser so take that as an outline on how to approach it, not at tested code.

[edit] As a second thought, I think sorting each time you process a text element is a waste, so you could change to

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:key name="k1" match="text/*" use="local-name()"/>

  <xsl:variable name="stdhead" select="document('stdhdg.xml')"/>

  <xsl:variable name="sorted-headers" as="element(hdg)*">
    <xsl:perform-sort select="$stdhead//hdg">
      <xsl:sort select="@newsort" data-type="number"/>
    </xsl:perform-sort>
  </xsl:variable>

<xsl:template match="text">
    <xsl:variable name="thetext" select="."/>
    <text>
        <xsl:for-each select="$sorted-headers">
            <xsl:apply-templates select="key('k1', @name, $thetext)"/>
        </xsl:for-each>
    </text>
</xsl:template>

</xsl:stylesheet>


Within the text element there are nearly 100 sections, and as with all editorial teams, they change their mind on the preferred order on an occasional, but regular, basis. Maybe twice per year.

. . . . . .

But it seems very slow and cumbersome and I feel that I should be using keys to speed it up

Sorting the document each time when it is presented for editing is the wrong approach.

The best solution is to sort it and save it sorted only 2 times per year when the 'stdhdg.xml' document is changed.

If the change in 'stdhdg.xml' cannot be organizationally synched well, you can have a repeating (say daily) job that runs the following transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="vHeaderLoc" select="'file:///C:/temp/deleteMe/stdhdg.xml'"/>

 <xsl:variable name="vHeaderDoc" select=
 "document($vHeaderLoc)"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
   "part/@hash
          [not(.
              = 
               string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
              )
          ]">
  <xsl:attribute name="hash">
   <xsl:value-of select="string($vHeaderDoc)"/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match=
   "/*/text[not(/*/@hash
                = string(document('file:///C:/temp/deleteMe/stdhdg.xml'))
                )
            ]">
  <text>
   <xsl:apply-templates select="*">
    <xsl:sort data-type="number"
     select="$vHeaderDoc/*/hdg[@name=name(current())]"/>
   </xsl:apply-templates>
  </text>
 </xsl:template>
</xsl:stylesheet>

when the main content XML document is (note the top element now has a hash attribute) is:

<part hash="010203040506">
    <meta>
        <place_id>12345</place_id>
        <place_name>London</place_name>
        <country_id>GB</country_id>
        <country_name>United Kingdom</country_name>
    </meta>
    <text>
        <docs>some blurb</docs>
        <airport>some blurb LGW LHR</airport>
        <trains>some blurb</trains>
        <hotels>some blurb</hotels>
        <health>some blurb</health>
        <attractions>some blurb</attractions>
    </text>
</part>

and the stdhdg.xml file is:

<hdgs>
    <hdg name="docs">10</hdg>
    <hdg name="airport">30</hdg>
    <hdg name="trains">20</hdg>
    <hdg name="hotels">40</hdg>
    <hdg name="health">60</hdg>
    <hdg name="attractions">50</hdg>
</hdgs>

then the transformation above produces a newly-sorted main content that has the latest hash:

<part hash="103020406050">
   <meta>
      <place_id>12345</place_id>
      <place_name>London</place_name>
      <country_id>GB</country_id>
      <country_name>United Kingdom</country_name>
   </meta>
   <text>
      <docs>some blurb</docs>
      <trains>some blurb</trains>
      <airport>some blurb LGW LHR</airport>
      <hotels>some blurb</hotels>
      <attractions>some blurb</attractions>
      <health>some blurb</health>
   </text>
</part>

Do Note:

  1. The top element of the main content document has now a hash attribute, whose value is the concatenation of the sort keys residing in the stdhdg.xml document.

  2. The format of the stdhdg.xml file is also slightly changed so that the concatenation of the keys ca be easily produced as the string value of the document.

  3. The daily-run transformation is the identity transformation if the hash saved in the main content is the same as the sort-keys-concatenation in stdhdg.xml.

  4. If the old hash and does not match the sort-keys in stdhdg.xml, then it is updated to the new hash and the sections are re-sorted.

0

精彩评论

暂无评论...
验证码 换一张
取 消