开发者

Process XHTML (well-formatted XML) unordered list data to grouped and sorted XML

开发者 https://www.devze.com 2023-04-06 08:15 出处:网络
I need to transform an an XHTML document (well-formatted XML) to a standard XML document. Input: <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>

I need to transform an an XHTML document (well-formatted XML) to a standard XML document.

Input:

<?xml version="1.0" encoding="iso-8859-1"?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "h开发者_如何学运维ttp://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>HTML Document Title</title>
  </head>
  <body>
    <h1>Welcome</h1>
    <div class="container">
      <ul>
        <li>
          <a href="a.html" title="abcdef AAA">New York</a>
        </li>
        <li>
          <a href="b.html" title="abcdef AAA">Los Angles</a>
        </li>
        <li>
          <a href="c.html" title="abcdef AAA">Alaska</a>
        </li>
        <li>
          <a href="d.html" title="abcdef BBB">Florida</a>
        </li>
        <li>
          <a href="e.html" title="zyxwvu AAA"><em>California</em></a>
        </li>
      </ul>
    </div>
  </body>
</html>

Note: I noticeed that having the DOCTYPE declaration and simple comments cause failure during XSLT parsing. So, I manually remove them before XSL parse. To parse the output properly, currently using 'xhtml:' prefix as provided at the post: Can I parse an HTML using XSLT?.

Group the elements based on the tags title value (sub-string 2nd part), e.g. AAA, BBB, etc. Further grouping on the 1st part of the title attribute value (e.g. abcdef / zyxwvu) or the presence of <em> tag. There would be totally four elements, such as <root>, <element>, <abcdef> and <zyxwvu> in the output. This is desired.

Expected Output:

<root>
    <element title="hard-coded title" href="hard-coded url">
        <element title="AAA" href="AAA.html">
            <abcdef>
                <element title="Alaska" href="c.html">
                <element title="Los Angles" href="b.html">
                <element title="New York" href="a.html">
            </abcdef>
            <zyxwvu>
                <element title="California" href="e.html">
            </zyxwvu>
        </element>
        <element title="BBB" href="BBB.html">
            <abcdef>
                <element title="Florida" href="d.html">
            </abcdef>
        </element>
    </element>
</root>

Would appreciate if the solution is provided in both XSLT v1.0 & v2.0.


This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:x="http://www.w3.org/1999/xhtml"
 exclude-result-prefixes="x">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kaByTail" match="x:a"
  use="substring-after(@title, ' ')"/>

 <xsl:key name="kaByHeadAndTail" match="x:a"
  use="concat(substring-before(@title, ' '),
              '+',
              substring-after(@title, ' ')
              )"/>
 <xsl:variable name="vAncors" select="//x:a"/>

 <xsl:template match="/">
  <root>
    <element title="hard-coded title" href="hard-coded url">
     <xsl:for-each select=
      "$vAncors
         [generate-id()
         =
          generate-id(key('kaByTail',
                           substring-after(@title, ' ')
                          )
                           [1]
                      )
         ]">
         <xsl:variable name="vKey"
              select="substring-after(@title, ' ')"/>

         <xsl:variable name="vGroup" select=
         "key('kaByTail', $vKey)"/>

        <element title="{$vKey}" href="{$vKey}.html">

         <xsl:for-each select=
         "$vGroup
            [generate-id()
            =
             generate-id(key('kaByHeadAndTail',
                             concat(substring-before(@title, ' '),
                                   '+',
                                    $vKey
                                    )
                            )
                             [1]
                         )
             ]

         ">
          <xsl:variable name="vKey2"
               select="substring-before(@title, ' ')"/>

          <xsl:element name="{$vKey2}">
           <xsl:for-each select=
            "key('kaByHeadAndTail',
                 concat($vKey2,'+',$vKey)
                 )">
            <xsl:sort/>
            <element title="{.}" href="{@href}"/>
           </xsl:for-each>
          </xsl:element>
         </xsl:for-each>
        </element>
     </xsl:for-each>
    </element>
  </root>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>HTML Document Title</title>
  </head>
  <body>
    <h1>Welcome</h1>
    <div class="container">
      <ul>
        <li>
          <a href="a.html" title="abcdef AAA">New York</a>
        </li>
        <li>
          <a href="b.html" title="abcdef AAA">Los Angles</a>
        </li>
        <li>
          <a href="c.html" title="abcdef AAA">Alaska</a>
        </li>
        <li>
          <a href="d.html" title="abcdef BBB">Florida</a>
        </li>
        <li>
          <a href="e.html" title="zyxwvu AAA"><em>California</em></a>
        </li>
      </ul>
    </div>
  </body>
</html>

produces the wanted, correct result:

<root>
   <element title="hard-coded title" href="hard-coded url">
      <element title="AAA" href="AAA.html">
         <abcdef>
            <element title="Alaska" href="c.html"/>
            <element title="Los Angles" href="b.html"/>
            <element title="New York" href="a.html"/>
         </abcdef>
         <zyxwvu>
            <element title="California" href="e.html"/>
         </zyxwvu>
      </element>
      <element title="BBB" href="BBB.html">
         <abcdef>
            <element title="Florida" href="d.html"/>
         </abcdef>
      </element>
   </element>
</root>

Explanation: Nested Muenchian grouping using first a single, then a composite grouping key.key

0

精彩评论

暂无评论...
验证码 换一张
取 消