开发者

Alphanumeric sort on mixed string value

开发者 https://www.devze.com 2023-01-18 22:10 出处:网络
Given XML snippet of: <forms> <FORM lob=\"BO\" form_name=\"AI OM 10\"/> <FORM lob=\"BO\" form_name=\"CL BP 03 01\"/>开发者_运维百科

Given XML snippet of:

<forms>
<FORM lob="BO" form_name="AI OM 10"/>
<FORM lob="BO" form_name="CL BP 03 01"/>开发者_运维百科
<FORM lob="BO" form_name="AI OM 107"/>
<FORM lob="BO" form_name="CL BP 00 02"/>
<FORM lob="BO" form_name="123 DDE"/>
<FORM lob="BO" form_name="CL BP 00 02"/>
<FORM lob="BO" form_name="AI OM 98"/>
</forms>

I need to sort the FORM nodes by form_name alphabetically so all the forms containing 'AI OM' in the form_name are grouped together and then within that they are in numeric order by the integers (same for other forms).

The form_name can be is open season as letters and numbers can be in any order:

XX ## ##

XX XX ##

XX XX ###

XX XX ## ##

XX ###

XX XXXX

'## XXX

XXX###

What I THINK needs to happen is that string needs to be split between alpha and numeric. The numeric part could probably be sorted with any spaces removed I suppose.

I am at a loss as to how to split the string and then cover all the sorting/grouping combinations given that there are no rules around the 'form_name' format.

We are using XSLT 2.0. Thanks.


This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vDigits" select="'0123456789 '"/>
 <xsl:variable name="vAlpha" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ '"/>

 <xsl:template match="/*">
  <forms>
   <xsl:for-each select="FORM">
    <xsl:sort select="translate(@form_name,$vDigits,'')"/>
    <xsl:sort select="translate(@form_name,$vAlpha,'')"
        data-type="number"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </forms>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<forms>
    <FORM lob="BO" form_name="AI OM 10"/>
    <FORM lob="BO" form_name="CL BP 03 01"/>
    <FORM lob="BO" form_name="AI OM 107"/>
    <FORM lob="BO" form_name="CL BP 00 02"/>
    <FORM lob="BO" form_name="123 DDE"/>
    <FORM lob="BO" form_name="CL BP 00 02"/>
    <FORM lob="BO" form_name="AI OM 98"/>
</forms>

produces the wanted, correct result:

<forms>
    <FORM lob="BO" form_name="AI OM 10"/>
    <FORM lob="BO" form_name="AI OM 98"/>
    <FORM lob="BO" form_name="AI OM 107"/>
    <FORM lob="BO" form_name="CL BP 00 02"/>
    <FORM lob="BO" form_name="CL BP 00 02"/>
    <FORM lob="BO" form_name="CL BP 03 01"/>
    <FORM lob="BO" form_name="123 DDE"/>
</forms>

Do note:

  1. Two <xsl:sort> instructions implement the two-phase sorting

  2. The XPath translate() function is used to produce either the alpha-only sort-key or the digits-only sort-key.


This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="forms">
        <xsl:apply-templates>
            <xsl:sort select="normalize-space(
                                translate(@form_name,
                                          '0123456789',
                                          ''))"/>
            <xsl:sort select="substring-before(
                                concat(
                                  normalize-space(
                                    translate(@form_name,
                                              translate(@form_name,
                                                        '0123456789 ',
                                                        ''),
                                              '')),
                                  ' '),' ')" data-type="number"/>
            <xsl:sort select="substring-after(
                                normalize-space(
                                  translate(@form_name,
                                            translate(@form_name,
                                                      '0123456789 ',
                                                      ''),
                                            '')),
                                  ' ')" data-type="number"/>
        </xsl:apply-templates>
    </xsl:template>
</xsl:stylesheet>

Output:

<FORM lob="BO" form_name="AI OM 10"></FORM>
<FORM lob="BO" form_name="AI OM 98"></FORM>
<FORM lob="BO" form_name="AI OM 107"></FORM>
<FORM lob="BO" form_name="CL BP 00 02"></FORM>
<FORM lob="BO" form_name="CL BP 00 02"></FORM>
<FORM lob="BO" form_name="CL BP 03 01"></FORM>
<FORM lob="BO" form_name="123 DDE"></FORM>

XSLT 2.0 solution: this stylesheet

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="forms">
        <xsl:apply-templates>
            <xsl:sort select="string-join(tokenize(@form_name,' ')
                                            [not(. castable as xs:integer)],
                                          ' ')"/>
            <xsl:sort select="xs:integer(tokenize(@form_name,' ')
                                            [. castable as xs:integer][1])"/>
            <xsl:sort select="xs:integer(tokenize(@form_name,' ')
                                            [. castable as xs:integer][2])"/>
        </xsl:apply-templates>
    </xsl:template>
</xsl:stylesheet>


It should be noted that the marked answer doesn't work in all cases.

Input:

<forms>
  <FORM lob="BO" form_name="AA 11 AB"/>
  <FORM lob="BO" form_name="AA AZ 01"/>
</forms>

Expected Output:

<forms>
  <FORM lob="BO" form_name="AA AZ 01"/>
  <FORM lob="BO" form_name="AA 11 AB"/>
</forms>

Actual Output:

<forms>
  <FORM lob="BO" form_name="AA 11 AB"/>
  <FORM lob="BO" form_name="AA AZ 01"/>
</forms>

If letters are allowed after numbers, you cannot strip them out in the first sort key.

0

精彩评论

暂无评论...
验证码 换一张
取 消