I need to transform an an XHTML document (well-formatted XML) to a standard XML document.
Input:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "h开发者_如何学运维ttp://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>HTML Document Title</title>
</head>
<body>
<h1>Welcome</h1>
<div class="container">
<ul>
<li>
<a href="a.html" title="abcdef AAA">New York</a>
</li>
<li>
<a href="b.html" title="abcdef AAA">Los Angles</a>
</li>
<li>
<a href="c.html" title="abcdef AAA">Alaska</a>
</li>
<li>
<a href="d.html" title="abcdef BBB">Florida</a>
</li>
<li>
<a href="e.html" title="zyxwvu AAA"><em>California</em></a>
</li>
</ul>
</div>
</body>
</html>
Note: I noticeed that having the DOCTYPE declaration and simple comments cause failure during XSLT parsing. So, I manually remove them before XSL parse. To parse the output properly, currently using 'xhtml:' prefix as provided at the post: Can I parse an HTML using XSLT?.
Group the elements based on the tags title value (sub-string 2nd part), e.g. AAA, BBB, etc. Further grouping on the 1st part of the title attribute value (e.g. abcdef / zyxwvu) or the presence of <em> tag. There would be totally four elements, such as <root>, <element>, <abcdef> and <zyxwvu> in the output. This is desired.
Expected Output:
<root>
<element title="hard-coded title" href="hard-coded url">
<element title="AAA" href="AAA.html">
<abcdef>
<element title="Alaska" href="c.html">
<element title="Los Angles" href="b.html">
<element title="New York" href="a.html">
</abcdef>
<zyxwvu>
<element title="California" href="e.html">
</zyxwvu>
</element>
<element title="BBB" href="BBB.html">
<abcdef>
<element title="Florida" href="d.html">
</abcdef>
</element>
</element>
</root>
Would appreciate if the solution is provided in both XSLT v1.0 & v2.0.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:x="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="x">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kaByTail" match="x:a"
use="substring-after(@title, ' ')"/>
<xsl:key name="kaByHeadAndTail" match="x:a"
use="concat(substring-before(@title, ' '),
'+',
substring-after(@title, ' ')
)"/>
<xsl:variable name="vAncors" select="//x:a"/>
<xsl:template match="/">
<root>
<element title="hard-coded title" href="hard-coded url">
<xsl:for-each select=
"$vAncors
[generate-id()
=
generate-id(key('kaByTail',
substring-after(@title, ' ')
)
[1]
)
]">
<xsl:variable name="vKey"
select="substring-after(@title, ' ')"/>
<xsl:variable name="vGroup" select=
"key('kaByTail', $vKey)"/>
<element title="{$vKey}" href="{$vKey}.html">
<xsl:for-each select=
"$vGroup
[generate-id()
=
generate-id(key('kaByHeadAndTail',
concat(substring-before(@title, ' '),
'+',
$vKey
)
)
[1]
)
]
">
<xsl:variable name="vKey2"
select="substring-before(@title, ' ')"/>
<xsl:element name="{$vKey2}">
<xsl:for-each select=
"key('kaByHeadAndTail',
concat($vKey2,'+',$vKey)
)">
<xsl:sort/>
<element title="{.}" href="{@href}"/>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</element>
</xsl:for-each>
</element>
</root>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title>HTML Document Title</title>
</head>
<body>
<h1>Welcome</h1>
<div class="container">
<ul>
<li>
<a href="a.html" title="abcdef AAA">New York</a>
</li>
<li>
<a href="b.html" title="abcdef AAA">Los Angles</a>
</li>
<li>
<a href="c.html" title="abcdef AAA">Alaska</a>
</li>
<li>
<a href="d.html" title="abcdef BBB">Florida</a>
</li>
<li>
<a href="e.html" title="zyxwvu AAA"><em>California</em></a>
</li>
</ul>
</div>
</body>
</html>
produces the wanted, correct result:
<root>
<element title="hard-coded title" href="hard-coded url">
<element title="AAA" href="AAA.html">
<abcdef>
<element title="Alaska" href="c.html"/>
<element title="Los Angles" href="b.html"/>
<element title="New York" href="a.html"/>
</abcdef>
<zyxwvu>
<element title="California" href="e.html"/>
</zyxwvu>
</element>
<element title="BBB" href="BBB.html">
<abcdef>
<element title="Florida" href="d.html"/>
</abcdef>
</element>
</element>
</root>
Explanation: Nested Muenchian grouping using first a single, then a composite grouping key.key
精彩评论