XSLT: remove duplicate br-tags from running text_问答_开发者

XSLT: remove duplicate br-tags from running text

开发者 https://www.devze.com 2023-01-23 01:24 出处：网络

When editing rich text content, our CMS generates XML-files with duplicate -tags. I\'d like to remove them in order to generate output that can be read by another application that does not

相关专题：xml xslt

When editing rich text content, our CMS generates XML-files with duplicate  -tags. I'd like to remove them in order to generate output that can be read by another application that does not appreciate the occurrence of those duplicates.

Example input:

<p>
   Lorem ipsum...<br />
   <br />
   ..dolor sit
</p>

Would generate something like this:

<p>
   Lorem ipsum...<br />
   ..dolor sit
</p>

I am already using XSLT to manipulate the output i开发者_StackOverflow社区n some other ways, and have found some examples of regexps and PHP that does the same thing, I just think it would be better if I could do this with XSLT due to the speed of the engine in our CMS (Roxen).

Thanks in advance!

Building off @Nic's answer, you could use

<xsl:template match='br[preceding-sibling::node()[1][self::br]]'/>

I've just changed * to node(). This would solve the problem of conflating two  s that have text in between. However it would stop removing duplicate  s even if there is only a whitespace node in between.

To solve that...

Deprecated

At first I had suggested you could strip whitespace-only nodes from p elements in the input doc, by putting this at the top level of your XSLT:

<xsl:strip-space  elements="p"/>

But @Alejandro pointed out that this could easily cause you to lose important spaces, as in bar baz.

So instead,

use this modified match pattern:

<xsl:template match='br[preceding-sibling::node()
                        [not(self::text() and normalize-space(.) = "")][1]
                        [self::br]]'/>

Kind of ugly but it should work. This will match and suppress "any br for which the preceding sibling node that is not a whitespace-only text node is also a br." :-)

Given that the match pattern is so complex, you may prefer to move some of that logic into the template body, as follows. I guess this is more a matter of personal taste and style:

<xsl:template match="br">
   <xsl:if test="not(preceding-sibling::node()
                        [not(self::text() and normalize-space(.) = '')][1]
                        [self::br])">
      <xsl:copy>
          <xsl:apply-templates select="@*|node()" />
      </xsl:copy>
   </xsl:if>
</xsl:template>

Here we use a copy of the identity transform when the   is not one we want to suppress. I don't think   can take child elements or text, but it doesn't hurt to be safe.

(Updated the above. I had forgotten to finish that sample code last time I saved edits.)

Using an identity transform to leave everything else alone, you could simply suppress every   that is directly preceded by another one. Obviously, you can then just fit the template into your existing XSLT.

<xsl:template match='node()|@*'>
    <xsl:copy>
        <xsl:apply-templates select='node()|@*'/>
    </xsl:copy>
</xsl:template>

<xsl:template match='br[(preceding-sibling::*)[1][self::br]]'/>

The empty template will simply suppress that  .

Update: As @LarsH points out, that template is too liberal in its matching and probably should be something like:

<xsl:template match='br[preceding-sibling::node()[1]
    [not(self::text() and normalize-space(.) = "")][self::br]]'/>

XSLT: remove duplicate br-tags from running text

精彩评论

关注公众号

热门标签

图文推荐

XSLT: remove duplicate br-tags from running text

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：