开发者

Using grouping to pull together text and then test it

开发者 https://www.devze.com 2022-12-18 19:32 出处:网络
So in this grotty extruded typesetting开发者_Python百科 product, I sometimes see links and email addresses that have been split apart. Example:

So in this grotty extruded typesetting开发者_Python百科 product, I sometimes see links and email addresses that have been split apart. Example:

<p>Here is some random text with an email address 
<Link>example</Link><Link>@example.com</Link> and here 
is more random text with a url 
<Link>http://www.</Link><Link>example.com</Link> near the end of the sentence.</p>

Desired output:

<p>Here is some random text with an email address 
<email>example@example.com</email> and here is more random text 
with a url <ext-link ext-link-type="uri" xlink:href="http://www.example.com/">
http://www.example.com/</ext-link> near the end of the sentence.</p>

Whitespace between the elements does not appear to occur, which is one blessing.

I can tell I need to use an xsl:for-each-group within the p template, but I can't quite see how to put the combined text from the group through the contains() function so as to distinguish emails from URLs. Help?


If you use group-adjacent then you can simply string-join the current-group() as in

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

  <xsl:template match="p">
    <xsl:copy>
      <xsl:for-each-group select="node()" group-adjacent="boolean(self::Link)">
        <xsl:choose>
          <xsl:when test="current-grouping-key()">
            <xsl:variable name="link-text" as="xsd:string" select="string-join(current-group(), '')"/>
            <xsl:choose>
              <xsl:when test="matches($link-text, '^https?://')">
                <ext-link ext-link-type="uri" xlink:href="{$link-text}">
                  <xsl:value-of select="$link-text"/>
                </ext-link>
              </xsl:when>
              <xsl:otherwise>
                <email><xsl:value-of select="$link-text"/></email>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:when>
          <xsl:otherwise>
            <xsl:apply-templates select="current-group()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>


Here is an XSLT 1.0 solution based on the identity template, with special treatment for <Link> elements.

<xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*" />
  </xsl:copy>
</xsl:template>

<xsl:template match="Link">
  <xsl:if test="not(preceding-sibling::node()[1][self::Link])">
    <xsl:variable name="link">
      <xsl:copy-of select="
        text()
        | 
        following-sibling::Link[
          preceding-sibling::node()[1][self::Link]
          and
          generate-id(current())
          =
          generate-id(
            preceding-sibling::Link[
              not(preceding-sibling::node()[1][self::Link])
            ][1]
          )
        ]/text()
      " />
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="contains($link, '://')">
        <ext-link ext-link-type="uri" xlink:href="{$link}" />
      </xsl:when>
      <xsl:when test="contains($link, '@')">
        <email>
          <xsl:value-of select="$link" />
        </email>
      </xsl:when>
      <xsl:otherwise>
        <link type="unknown">
          <xsl:value-of select="$link" />
        </link>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:if>
</xsl:template>

I know that XPath expressions used are some quite a hairy monsters, but selecting adjacent siblings is not easy in XPath 1.0 (if someone has a better idea how to do it in XPath 1.0, go ahead and tell me).

not(preceding-sibling::node()[1][self::Link])

means "the immediately preceding node must not be a <Link>", e.g.: only <Link> elements that are "first in a row".

following-sibling::Link[
  preceding-sibling::node()[1][self::Link]
  and
  generate-id(current())
  =
  generate-id(
    preceding-sibling::Link[
      not(preceding-sibling::node()[1][self::Link])
    ][1]
  )
]

means

  • from all following-sibling <Link>s, choose the ones that
    • immediately follow a <Link> (e.g. they are not "first in a row"), and
    • the ID of the current() node (always a <Link> that's "first in a row") must be equal to:
    • the closest preceding <Link> that itself is "first in a row"

If that makes sense.

Applied to your input, I get:

<p>Here is some random text with an email address
<email>example@example.com</email> and here
is more random text with a url
<ext-link ext-link-type="uri" xlink:href="http://www.example.com" /> near the end of the sentence.</p>
0

精彩评论

暂无评论...
验证码 换一张
取 消