开发者

How to "collapse" but not "normalize" whitespaces in xlst

开发者 https://www.devze.com 2023-01-02 16:56 出处:网络
I have an xml/tei like <p> In trattoria scoprii che c\'era <del rend=\"tratto a matita\">anche</del> Mirella,

I have an xml/tei like

 <p> In trattoria scoprii che c'era <del rend="tratto a matita">anche</del> Mirella,
                non la non vedevo da almeno sei anni. 
                La spianata dava infatti l'impressione di fango secco, <del rend="matita">divorato
                    dalle rughe</del><add place="margine sinistro" rend="matita">attraversato da
   开发者_开发百科                 lunghe ferite nere</add>. Lontano si vedeva una montagna di creta dello
                stesso colore della mota. </p>

I am using this stylesheet to remove whitespaces, both between elements and inside text nodes.

    <xsl:strip-space elements="*"/>

<xsl:template match="/">
    <xsl:apply-templates />
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:for-each select="@*">
            <xsl:attribute name="{name()}">
                <xsl:value-of select="normalize-space()"/>
            </xsl:attribute>
        </xsl:for-each>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>
<xsl:template match="text()">
    <xsl:value-of select="normalize-space()"/>
</xsl:template>

All goes well exept for the fact that normalize-space() removes also leading and traling whitespaces, so I have some undesidered behaviour like

c'era<del rend="tratto a matita">anche</del>Mirella

I can't exclude mixed-mode content form the removing, because my first need is to collapse whitespaces like returns, tabs, identation INSIDE, say, the <p> element.

Is there a way/function/trick to collapse multiple whitespaces in a single whitespace whithout removing the leading and trailing whitespace?


I don't think there is a built in function to do this easily, but (at least in XPath 2) there is a pretty complete regular expression language with a replace() function that you should be able to convince to do what you want. (With a more readable introduction at xml.com).

I think all you need to do is replace:

select="normalize-space()"

with

select="replace(., '(\s\s+)', ' ')"

but I've not tested this.

Edit: Fixed the first argument in replace, as noted by Mycol below.

0

精彩评论

暂无评论...
验证码 换一张
取 消