开发者

XSLT: remove duplicate br-tags from running text

开发者 https://www.devze.com 2023-01-23 01:24 出处:网络
When editing rich text content, our CMS generates XML-files with duplicate <br/>-tags. I\'d like to remove them in order to generate output that can be read by another application that does not

When editing rich text content, our CMS generates XML-files with duplicate <br/>-tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.

Example input:

<p>
   Lorem ipsum...<br />
   <br />
   ..dolor sit
</p>

Would generate something like this:

<p>
   Lorem ipsum...<br />
   ..dolor sit
</p>

I am already using XSLT to manipulate the output i开发者_StackOverflow社区n some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).

Thanks in advance!


Building off @Nic's answer, you could use

<xsl:template match='br[preceding-sibling::node()[1][self::br]]'/>

I've just changed * to node(). This would solve the problem of conflating two <br/>s that have text in between. However it would stop removing duplicate <br/>s even if there is only a whitespace node in between.

To solve that...

Deprecated

At first I had suggested you could strip whitespace-only nodes from p elements in the input doc, by putting this at the top level of your XSLT:

<xsl:strip-space  elements="p"/>

But @Alejandro pointed out that this could easily cause you to lose important spaces, as in <p><em>bar</em> <em>baz</em></p>.

So instead,

use this modified match pattern:

<xsl:template match='br[preceding-sibling::node()
                        [not(self::text() and normalize-space(.) = "")][1]
                        [self::br]]'/>

Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)

Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:

<xsl:template match="br">
   <xsl:if test="not(preceding-sibling::node()
                        [not(self::text() and normalize-space(.) = '')][1]
                        [self::br])">
      <xsl:copy>
          <xsl:apply-templates select="@*|node()" />
      </xsl:copy>
   </xsl:if>
</xsl:template>

Here we use a copy of the identity transform when the <br /> is not one we want to suppress. I don't think <br /> can take child elements or text, but it doesn't hurt to be safe.

(Updated the above. I had forgotten to finish that sample code last time I saved edits.)


Using an identity transform to leave everything else alone, you could simply suppress every <br/> that is directly preceded by another one. Obviously, you can then just fit the template into your existing XSLT.

<xsl:template match='node()|@*'>
    <xsl:copy>
        <xsl:apply-templates select='node()|@*'/>
    </xsl:copy>
</xsl:template>

<xsl:template match='br[(preceding-sibling::*)[1][self::br]]'/>

The empty template will simply suppress that <br/>.

Update: As @LarsH points out, that template is too liberal in its matching and probably should be something like:

<xsl:template match='br[preceding-sibling::node()[1]
    [not(self::text() and normalize-space(.) = "")][self::br]]'/>
0

精彩评论

暂无评论...
验证码 换一张
取 消