When editing rich text content, our CMS generates XML-files with duplicate <br/>
-tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.
Example input:
<p>
Lorem ipsum...<br />
<br />
..dolor sit
</p>
Would generate something like this:
<p>
Lorem ipsum...<br />
..dolor sit
</p>
I am already using XSLT to manipulate the output i开发者_StackOverflow社区n some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).
Thanks in advance!
Building off @Nic's answer, you could use
<xsl:template match='br[preceding-sibling::node()[1][self::br]]'/>
I've just changed *
to node()
.
This would solve the problem of conflating two <br/>
s that have text in between. However it would stop removing duplicate <br/>
s even if there is only a whitespace node in between.
To solve that...
Deprecated
At first I had suggested you could strip whitespace-only nodes from p
elements in the input doc, by putting this at the top level of your XSLT:
<xsl:strip-space elements="p"/>
But @Alejandro pointed out that this could easily cause you to lose important spaces, as in <p><em>bar</em> <em>baz</em></p>
.
So instead,
use this modified match pattern:
<xsl:template match='br[preceding-sibling::node()
[not(self::text() and normalize-space(.) = "")][1]
[self::br]]'/>
Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)
Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:
<xsl:template match="br">
<xsl:if test="not(preceding-sibling::node()
[not(self::text() and normalize-space(.) = '')][1]
[self::br])">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:if>
</xsl:template>
Here we use a copy of the identity transform when the <br />
is not one we want to suppress. I don't think <br />
can take child elements or text, but it doesn't hurt to be safe.
(Updated the above. I had forgotten to finish that sample code last time I saved edits.)
Using an identity transform to leave everything else alone, you could simply suppress every <br/>
that is directly preceded by another one. Obviously, you can then just fit the template into your existing XSLT.
<xsl:template match='node()|@*'>
<xsl:copy>
<xsl:apply-templates select='node()|@*'/>
</xsl:copy>
</xsl:template>
<xsl:template match='br[(preceding-sibling::*)[1][self::br]]'/>
The empty template will simply suppress that <br/>
.
Update: As @LarsH points out, that template is too liberal in its matching and probably should be something like:
<xsl:template match='br[preceding-sibling::node()[1]
[not(self::text() and normalize-space(.) = "")][self::br]]'/>
精彩评论