I have some input long (about 3k lines) XML document, which generally looks as:
<chapter someAttributes="someValues">
<title>someTitle</title>
<p>multiple paragraphs</p>
<p>...</p>
<li>
<p>- some text</p>
</li>
<li>
<p>- some other text</p>
</li>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<li>
<p>1. some text</p>
</li>
<li>
<p>2. some other text</p>
</li>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<!-- there are other elements such as table, illustration, ul etc. -->
</chapter>
What I want is to wrap every scatt开发者_Python百科ered (I mean between paragraphs, tables, illustrations etc.) sequence of li
elements with ol
or ul
element depending on some semantic and return wrapped XML.
- if first character in paragraph is equal to
-
, then it should beul
withmark="DASH"
attribute - if paragraphs starts with
1.
,2.
,3.
etc, then I wantol
withnumeration="ARABIC"
For example (it's just one sequence):
<ul mark="DASH">
<li>
<p> some text</p>
</li>
<li>
<p> some other text</p>
</li>
<ul>
As you see furthermore I need to cut "mark character(s)" from all paragraphs, that is -
or 1.
, 2.
, 3.
etc.
That input XML is more complicated than I described (nested sequences, inner sequences in table elements), but I am looking for some idea, especially how to catch & process particular sequence with such semantic.
I want output XML with exactly same ordering, just with wrapped li
elements. XSLT 2.0/EXSLT are available if needed.
Here is an XSLT 2.0 stylesheet:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="chapter">
<xsl:copy>
<xsl:for-each-group select="*" group-adjacent="boolean(self::li)">
<xsl:choose>
<xsl:when test="current-grouping-key() and ./p[1][starts-with(., '-')]">
<ul mark="DASH">
<xsl:apply-templates select="current-group()"/>
</ul>
</xsl:when>
<xsl:when test="current-grouping-key() and ./p[1][matches(., '[0-9]\.')]">
<ol numeration="arabic">
<xsl:apply-templates select="current-group()"/>
</ol>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="li/p/text()[1]">
<xsl:value-of select="replace(., '^(-|[0-9]\.)', '')"/>
</xsl:template>
</xsl:stylesheet>
When I use Saxon 9.3 with that stylesheet and the sample input
<chapter someAttributes="someValues">
<title>someTitle</title>
<p>multiple paragraphs</p>
<p>...</p>
<li>
<p>- some text</p>
</li>
<li>
<p>- some other text</p>
</li>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<li>
<p>1. some text</p>
</li>
<li>
<p>2. some other text</p>
</li>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<!-- there are other elements such as table, illustration, ul etc. -->
</chapter>
I get the following output:
<?xml version="1.0" encoding="UTF-8"?>
<chapter>
<title>someTitle</title>
<p>multiple paragraphs</p>
<p>...</p>
<ul mark="DASH">
<li>
<p> some text</p>
</li>
<li>
<p> some other text</p>
</li>
</ul>
<p>multiple other paragraphs</p>
<p>...</p>
<ol numeration="arabic">
<li>
<p> some text</p>
</li>
<li>
<p> some other text</p>
</li>
</ol>
<p>multiple other paragraphs</p>
<p>...</p>
</chapter>
Here's a full functional solution, without any procedural approach like xsl:for-each-group
and xsl:if
.
XSLT 2.0 tested under Saxon-B 9.0.0.1J
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes" method="html"/>
<xsl:strip-space elements="*"/>
<!-- identity -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- override dash list elements -->
<xsl:template match="li[(name(preceding-sibling::*[position()=1])
!= name(current()))
and matches(.,'^-')]">
<ul mark="DASH">
<li><xsl:apply-templates/></li>
<!-- apply recursive template for adjacent nodes -->
<xsl:apply-templates select="following-sibling::*[1][name()
=name(current())]" mode="next"/>
</ul>
</xsl:template>
<!-- override numeration list elements -->
<xsl:template match="li[(name(preceding-sibling::*[position()=1])
!= name(current()))
and matches(.,'^[0-9]\.')]">
<ol numeration="ARABIC">
<li><xsl:apply-templates/></li>
<xsl:apply-templates select="following-sibling::*[1][name()
=name(current())]" mode="next"/>
</ol>
</xsl:template>
<!-- recursive template for adjacent nodes -->
<xsl:template match="*" mode="next">
<li><xsl:apply-templates/></li>
<xsl:apply-templates select="following-sibling::*[1][name()
=name(current())]" mode="next"/>
</xsl:template>
<!-- remove marks/numeration from first text node -->
<xsl:template match="li/p/text()[1]">
<xsl:value-of select="replace(., '^(-|[0-9]\.)\s+', '')"/>
</xsl:template>
</xsl:stylesheet>
Applied on your input produces:
<chapter someAttributes="someValues">
<title>someTitle</title>
<p>multiple paragraphs</p>
<p>...</p>
<ul mark="DASH">
<li>
<p>some text</p>
</li>
<li>
<p>some other text</p>
</li>
</ul>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<ol numeration="ARABIC">
<li>
<p>some text</p>
</li>
<li>
<p>some other text</p>
</li>
</ol>
<!-- another li elements -->
<p>multiple other paragraphs</p>
<p>...</p>
<!-- there are other elements such as table, illustration, ul etc. -->
</chapter>
精彩评论