开发者

Wrapping multiple sequences of list elements using XSL transformation (XML to XML)

开发者 https://www.devze.com 2023-03-10 21:09 出处:网络
I have some input long (about 3k lines) XML document, which generally looks as: <chapter someAttributes=\"someValues\">

I have some input long (about 3k lines) XML document, which generally looks as:

<chapter someAttributes="someValues">
    <title>someTitle</title>

    <p>multiple paragraphs</p>
    <p>...</p>

    <li>
        <p>- some text</p>
    </li>
    <li>
        <p>- some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <li>
        <p>1. some text</p>
    </li>
    <li>
        <p>2. some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <!-- there are other elements such as table, illustration, ul etc. -->  
</chapter>

What I want is to wrap every scatt开发者_Python百科ered (I mean between paragraphs, tables, illustrations etc.) sequence of li elements with ol or ul element depending on some semantic and return wrapped XML.

  • if first character in paragraph is equal to -, then it should be ul with mark="DASH" attribute
  • if paragraphs starts with 1., 2., 3. etc, then I want ol with numeration="ARABIC"

For example (it's just one sequence):

<ul mark="DASH">
    <li>
        <p> some text</p>
    </li>
    <li>
        <p> some other text</p>
    </li>
<ul>

As you see furthermore I need to cut "mark character(s)" from all paragraphs, that is - or 1., 2., 3. etc.

That input XML is more complicated than I described (nested sequences, inner sequences in table elements), but I am looking for some idea, especially how to catch & process particular sequence with such semantic.

I want output XML with exactly same ordering, just with wrapped li elements. XSLT 2.0/EXSLT are available if needed.


Here is an XSLT 2.0 stylesheet:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

  <xsl:output indent="yes"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:for-each-group select="*" group-adjacent="boolean(self::li)">
        <xsl:choose>
          <xsl:when test="current-grouping-key() and ./p[1][starts-with(., '-')]">
            <ul mark="DASH">
              <xsl:apply-templates select="current-group()"/>
            </ul>
          </xsl:when>
          <xsl:when test="current-grouping-key() and ./p[1][matches(., '[0-9]\.')]">
            <ol numeration="arabic">
              <xsl:apply-templates select="current-group()"/>
            </ol>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy-of select="current-group()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="li/p/text()[1]">
    <xsl:value-of select="replace(., '^(-|[0-9]\.)', '')"/>
  </xsl:template>

</xsl:stylesheet>

When I use Saxon 9.3 with that stylesheet and the sample input

<chapter someAttributes="someValues">
    <title>someTitle</title>

    <p>multiple paragraphs</p>
    <p>...</p>

    <li>
        <p>- some text</p>
    </li>
    <li>
        <p>- some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <li>
        <p>1. some text</p>
    </li>
    <li>
        <p>2. some other text</p>
    </li>
    <!-- another li elements -->

    <p>multiple other paragraphs</p>
    <p>...</p>

    <!-- there are other elements such as table, illustration, ul etc. -->  
</chapter>

I get the following output:

<?xml version="1.0" encoding="UTF-8"?>
<chapter>
   <title>someTitle</title>
   <p>multiple paragraphs</p>
   <p>...</p>
   <ul mark="DASH">
      <li>
        <p> some text</p>
      </li>
      <li>
        <p> some other text</p>
      </li>
   </ul>
   <p>multiple other paragraphs</p>
   <p>...</p>
   <ol numeration="arabic">
      <li>
        <p> some text</p>
      </li>
      <li>
        <p> some other text</p>
      </li>
   </ol>
   <p>multiple other paragraphs</p>
   <p>...</p>
</chapter>


Here's a full functional solution, without any procedural approach like xsl:for-each-group and xsl:if.

XSLT 2.0 tested under Saxon-B 9.0.0.1J

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes" method="html"/>

    <xsl:strip-space elements="*"/>

    <!-- identity -->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <!-- override dash list elements -->
    <xsl:template match="li[(name(preceding-sibling::*[position()=1]) 
        != name(current())) 
        and matches(.,'^-')]">

        <ul mark="DASH">
            <li><xsl:apply-templates/></li>
            <!-- apply recursive template for adjacent nodes -->
            <xsl:apply-templates select="following-sibling::*[1][name()
                =name(current())]" mode="next"/>
        </ul>
    </xsl:template>

    <!-- override numeration list elements -->
    <xsl:template match="li[(name(preceding-sibling::*[position()=1]) 
        != name(current())) 
        and matches(.,'^[0-9]\.')]">
        <ol numeration="ARABIC">
            <li><xsl:apply-templates/></li>
            <xsl:apply-templates select="following-sibling::*[1][name()
                =name(current())]" mode="next"/>
        </ol>
    </xsl:template>

    <!-- recursive template for adjacent nodes -->
    <xsl:template match="*" mode="next">
        <li><xsl:apply-templates/></li>
        <xsl:apply-templates select="following-sibling::*[1][name()
            =name(current())]" mode="next"/>
    </xsl:template>

    <!-- remove marks/numeration from first text node -->
    <xsl:template match="li/p/text()[1]">
        <xsl:value-of select="replace(., '^(-|[0-9]\.)\s+', '')"/>
    </xsl:template>

</xsl:stylesheet>

Applied on your input produces:

<chapter someAttributes="someValues">
   <title>someTitle</title>
   <p>multiple paragraphs</p>
   <p>...</p>
   <ul mark="DASH">
      <li>
         <p>some text</p>
      </li>
      <li>
         <p>some other text</p>
      </li>
   </ul>
   <!-- another li elements -->
   <p>multiple other paragraphs</p>
   <p>...</p>
   <ol numeration="ARABIC">
      <li>
         <p>some text</p>
      </li>
      <li>
         <p>some other text</p>
      </li>
   </ol>
   <!-- another li elements -->
   <p>multiple other paragraphs</p>
   <p>...</p>
   <!-- there are other elements such as table, illustration, ul etc. -->
</chapter>
0

精彩评论

暂无评论...
验证码 换一张
取 消