开发者

How to recursively delete some xml elements using XSLT

开发者 https://www.devze.com 2022-12-26 15:46 出处:网络
So I got this situation which sucks. I have an XML like this <table border="1" cols="200 100pt 200">

So I got this situation which sucks. I have an XML like this



<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td />
    <td />
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      &l开发者_如何学编程t;span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span></span>
    </td>
  </tr>
  <tr></tr>
  <tr>
    <td></td>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td >bar</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td ></td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td >bar</td>
        <td ></td>
      </tr>
      <tr>
        <td >foo</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr>
        <td />
        <td />
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr>
      <td />
      <td />
    </tr>
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr>
    <td />
    <td />
  </tr>
</table>


Now the question is how would I recursively delete all empty td, tr and table elements?

Now I use this XSLT



<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*" />

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="td[not(node())]" />
  <xsl:template match="tr[not(node())]" />
  <xsl:template match="table[not(node())]" />

</xsl:stylesheet>


But it doesn't do very well. After I delete td, a tr becomes empty but it doesn't handle that. Too bad. See the table element with "cas4".



<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      <span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span />
    </td>
  </tr>
  <tr>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>bar</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td>bar</td>
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr />
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr />
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr />
</table>


How would you solve this problem?


It sounds like your definition of empty is "contains no text or only whitespace". Is this the case? If so, the following transformation should do the trick:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
  <xsl:output omit-xml-declaration="yes" indent="yes"/> 
  <xsl:strip-space elements="*" /> 

  <xsl:template match="node()|@*"> 
    <xsl:copy> 
      <xsl:apply-templates select="node()|@*"/> 
    </xsl:copy> 
  </xsl:template> 

  <xsl:template match="td[not(normalize-space(.))]" /> 
  <xsl:template match="tr[not(normalize-space(.))]" /> 
  <xsl:template match="table[not(normalize-space(.))]" /> 
</xsl:stylesheet> 


There is your solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table | tr | td">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

The idea is simple. I will do the transformation for my descendants first and then I will look if there is anyone left. If so I will copy myself and the result of the transformation.

If you want to preserve the table structure and remove only empty rows - elements <tr> that contains only empty elements <td>, than just create similar template for <tr> with different condition and ignore elements <td>.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

    <xsl:template match="tr">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- number of non-empty td elements -->
        <xsl:variable name="cellCount">
            <xsl:value-of select="count($content/td[node()])" />
        </xsl:variable>

        <!-- number of other elements -->
        <xsl:variable name="elementCount">
            <xsl:value-of select="count($content/node()[name() != 'td'])" />
        </xsl:variable>

        <xsl:if test="$cellCount > 0 or $elementCount > 0">
            <xsl:copy>                  
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

Well, actually the last if should be like this:

<xsl:choose>
    <!-- if there are cells then copy the content -->
    <xsl:when test="$cellCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content" />
        </xsl:copy>
    </xsl:when>

    <!-- if there are only other elements copy them -->
    <xsl:when test="$elementCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content/node()[name() != 'td']" />
        </xsl:copy>
    </xsl:when>
</xsl:choose>

That is because of the situation when <tr> contains empty elements <td> and another elements. Then you want to delete the <td>s and leave only the rest.


You could also filter out any table that only contains <tr> with empty <td>, and any <tr> with only empty <tr> (in addition to your other filters), using something like this (not tested):

<xsl:template match="tr[not(td/node())]" /> 
<xsl:template match="table[not(tr/td/node())]" /> 
0

精彩评论

暂无评论...
验证码 换一张
取 消