开发者

A tricky XSLT transformation

开发者 https://www.devze.com 2023-04-07 13:20 出处:网络
I have a loosely structured XHTML data and I need to convert it to better structured XML. Here\'s the example:

I have a loosely structured XHTML data and I need to convert it to better structured XML.

Here's the example:

<tbody>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
    <td>Green</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td>Red</td>
    <td>Round shaped</td>
    <td>Bitter</td>
</tr>
<tr>
    <td>Pink</td>
    <td>Round shaped</td>
    <td>Tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
    <td>Red</td>
    <td>Heart shaped</td>
    <td>Super tasty</td>
</tr>
<tr>
    <td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
    <td>Yellow</td>
    <td>Smile shaped</td>
    <td>Fairly tasty</td>
</tr>
<tr>
    <td>Brown</td>
    <td>Smile shaped</td>
    <td>Too sweet</td>
</tr>

I am trying to achieve following structure:

    <data>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Green</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Apples</type>
        <country>Portugal</country>
        <rank>First Grade</rank>
        <color>Red</color>
        <shape>Round shaped</shape>
        <taste>Bitter</taste>
    </entry>
    <entry>
        <type>Apples</type>
        &l开发者_开发问答t;country>Portugal</country>
        <rank>First Grade</rank>
        <color>Pink</color>
        <shape>Round shaped</shape>
        <taste>Tasty</taste>
    </entry>
    <entry>
        <type>Strawberries</type>
        <country>USA</country>
        <rank>Fifth Grade</rank>
        <color>Red</color>
        <shape>Heart shaped</shape>
        <taste>Super</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Yellow</color>
        <shape>Smile shaped</shape>
        <taste>Fairly tasty</taste>
    </entry>
    <entry>
        <type>Bananas</type>
        <country>Congo</country>
        <rank>Third Grade</rank>
        <color>Brown</color>
        <shape>Smile shaped</shape>
        <taste>Too sweet</taste>
    </entry>
</data>

Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.

Next I need to populate all the entries under each category while including those values (like shown above).

But... As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100...

I've tried a few approaches but just can't seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.


In this case, you are not actually grouping elements. It is more like ungrouping them.

One way to do this is to use an xsl:key to look up the "header" row for each of detail rows.

<xsl:key name="fruity" 
   match="tr[not(td[@class='header'])]" 
   use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

i.e For each detail row, get the most previous header row.

Next, you can then match all your header rows like so:

<xsl:apply-templates select="tr/td[@class='header']"/>

Within the matching template, you could then extract the type, country and rank. Then to get the associated detail rows, it is a simple case of looking at the key for the parent row:

<xsl:apply-templates select="key('fruity', generate-id(..))">

Here is the overall XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>

   <xsl:key name="fruity" 
      match="tr[not(td[@class='header'])]" 
      use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>

   <xsl:template match="/tbody">
      <data>
         <!-- Match header rows -->
         <xsl:apply-templates select="tr/td[@class='header']"/>
      </data>
   </xsl:template>

   <xsl:template match="td">
      <!-- Match associated detail rows -->
      <xsl:apply-templates select="key('fruity', generate-id(..))">
         <!-- Extract relevant parameters from the td cell -->
         <xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
         <xsl:with-param name="country" select="img[2]/@alt"/>
         <xsl:with-param name="rank" select="normalize-space(text())"/>
      </xsl:apply-templates>
   </xsl:template>

   <xsl:template match="tr">
      <xsl:param name="type"/>
      <xsl:param name="country"/>
      <xsl:param name="rank"/>
      <entry>
         <type>
            <xsl:value-of select="$type"/>
         </type>
         <country>
            <xsl:value-of select="$country"/>
         </country>
         <rank>
            <xsl:value-of select="$rank"/>
         </rank>
         <color>
            <xsl:value-of select="td[1]"/>
         </color>
         <shape>
            <xsl:value-of select="td[2]"/>
         </shape>
         <taste>
            <xsl:value-of select="td[3]"/>
         </taste>
      </entry>
   </xsl:template>
</xsl:stylesheet>

When applied to your input document, the following output is generated:

<data>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Green</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Red</color>
      <shape>Round shaped</shape>
      <taste>Bitter</taste>
   </entry>
   <entry>
      <type>apples</type>
      <country>Portugal</country>
      <rank>First Grade</rank>
      <color>Pink</color>
      <shape>Round shaped</shape>
      <taste>Tasty</taste>
   </entry>
   <entry>
      <type>strawberries</type>
      <country>USA</country>
      <rank>Fifth Grade</rank>
      <color>Red</color>
      <shape>Heart shaped</shape>
      <taste>Super tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Yellow</color>
      <shape>Smile shaped</shape>
      <taste>Fairly tasty</taste>
   </entry>
   <entry>
      <type>bananas</type>
      <country>Congo</country>
      <rank>Third Grade</rank>
      <color>Brown</color>
      <shape>Smile shaped</shape>
      <taste>Too sweet</taste>
   </entry>
</data>
0

精彩评论

暂无评论...
验证码 换一张
取 消