So in this grotty extruded typesetting开发者_Python百科 product, I sometimes see links and email addresses that have been split apart. Example:
<p>Here is some random text with an email address
<Link>example</Link><Link>@example.com</Link> and here
is more random text with a url
<Link>http://www.</Link><Link>example.com</Link> near the end of the sentence.</p>
Desired output:
<p>Here is some random text with an email address
<email>example@example.com</email> and here is more random text
with a url <ext-link ext-link-type="uri" xlink:href="http://www.example.com/">
http://www.example.com/</ext-link> near the end of the sentence.</p>
Whitespace between the elements does not appear to occur, which is one blessing.
I can tell I need to use an xsl:for-each-group within the p template, but I can't quite see how to put the combined text from the group through the contains() function so as to distinguish emails from URLs. Help?
If you use group-adjacent then you can simply string-join the current-group() as in
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xsd"
version="2.0">
<xsl:template match="p">
<xsl:copy>
<xsl:for-each-group select="node()" group-adjacent="boolean(self::Link)">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<xsl:variable name="link-text" as="xsd:string" select="string-join(current-group(), '')"/>
<xsl:choose>
<xsl:when test="matches($link-text, '^https?://')">
<ext-link ext-link-type="uri" xlink:href="{$link-text}">
<xsl:value-of select="$link-text"/>
</ext-link>
</xsl:when>
<xsl:otherwise>
<email><xsl:value-of select="$link-text"/></email>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Here is an XSLT 1.0 solution based on the identity template, with special treatment for <Link>
elements.
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*" />
</xsl:copy>
</xsl:template>
<xsl:template match="Link">
<xsl:if test="not(preceding-sibling::node()[1][self::Link])">
<xsl:variable name="link">
<xsl:copy-of select="
text()
|
following-sibling::Link[
preceding-sibling::node()[1][self::Link]
and
generate-id(current())
=
generate-id(
preceding-sibling::Link[
not(preceding-sibling::node()[1][self::Link])
][1]
)
]/text()
" />
</xsl:variable>
<xsl:choose>
<xsl:when test="contains($link, '://')">
<ext-link ext-link-type="uri" xlink:href="{$link}" />
</xsl:when>
<xsl:when test="contains($link, '@')">
<email>
<xsl:value-of select="$link" />
</email>
</xsl:when>
<xsl:otherwise>
<link type="unknown">
<xsl:value-of select="$link" />
</link>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:template>
I know that XPath expressions used are some quite a hairy monsters, but selecting adjacent siblings is not easy in XPath 1.0 (if someone has a better idea how to do it in XPath 1.0, go ahead and tell me).
not(preceding-sibling::node()[1][self::Link])
means "the immediately preceding node must not be a <Link>
", e.g.: only <Link>
elements that are "first in a row".
following-sibling::Link[ preceding-sibling::node()[1][self::Link] and generate-id(current()) = generate-id( preceding-sibling::Link[ not(preceding-sibling::node()[1][self::Link]) ][1] ) ]
means
- from all following-sibling
<Link>
s, choose the ones that- immediately follow a
<Link>
(e.g. they are not "first in a row"), and - the ID of the
current()
node (always a<Link>
that's "first in a row") must be equal to: - the closest preceding
<Link>
that itself is "first in a row"
- immediately follow a
If that makes sense.
Applied to your input, I get:
<p>Here is some random text with an email address
<email>example@example.com</email> and here
is more random text with a url
<ext-link ext-link-type="uri" xlink:href="http://www.example.com" /> near the end of the sentence.</p>
精彩评论