I have an xml/tei like
<p> In trattoria scoprii che c'era <del rend="tratto a matita">anche</del> Mirella,
non la non vedevo da almeno sei anni.
La spianata dava infatti l'impressione di fango secco, <del rend="matita">divorato
dalle rughe</del><add place="margine sinistro" rend="matita">attraversato da
开发者_开发百科 lunghe ferite nere</add>. Lontano si vedeva una montagna di creta dello
stesso colore della mota. </p>
I am using this stylesheet to remove whitespaces, both between elements and inside text nodes.
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="normalize-space()"/>
</xsl:attribute>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
All goes well exept for the fact that normalize-space() removes also leading and traling whitespaces, so I have some undesidered behaviour like
c'era<del rend="tratto a matita">anche</del>Mirella
I can't exclude mixed-mode content form the removing, because my first need is to collapse whitespaces like returns, tabs, identation INSIDE, say, the <p>
element.
Is there a way/function/trick to collapse multiple whitespaces in a single whitespace whithout removing the leading and trailing whitespace?
I don't think there is a built in function to do this easily, but (at least in XPath 2) there is a pretty complete regular expression language with a replace()
function that you should be able to convince to do what you want. (With a more readable introduction at xml.com).
I think all you need to do is replace:
select="normalize-space()"
with
select="replace(., '(\s\s+)', ' ')"
but I've not tested this.
Edit: Fixed the first argument in replace, as noted by Mycol below.
精彩评论