I have a loosely structured XHTML data and I need to convert it to better structured XML.
Here's the example:
<tbody>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_apples.gif"/><img src="http://www.abc.com/images/flag/portugal.gif" alt="Portugal"/> First Grade</td>
</tr>
<tr>
<td>Green</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td>Red</td>
<td>Round shaped</td>
<td>Bitter</td>
</tr>
<tr>
<td>Pink</td>
<td>Round shaped</td>
<td>Tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_strawberries.gif"/><img src="http://www.abc.com/images/flag/usa.gif" alt="USA"/> Fifth Grade</td>
</tr>
<tr>
<td>Red</td>
<td>Heart shaped</td>
<td>Super tasty</td>
</tr>
<tr>
<td class="header"><img src="http://www.abc.com/images/icon_bananas.gif"/><img src="http://www.abc.com/images/flag/congo.gif" alt="Congo"/> Third Grade</td>
</tr>
<tr>
<td>Yellow</td>
<td>Smile shaped</td>
<td>Fairly tasty</td>
</tr>
<tr>
<td>Brown</td>
<td>Smile shaped</td>
<td>Too sweet</td>
</tr>
I am trying to achieve following structure:
<data>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>Apples</type>
&l开发者_开发问答t;country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>Strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>Bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
Firstly I need to extract the fruit type from the tbody/tr/td/img[1]/@src, secondly the country from tbody/tr/td/img[2]/@alt attribute and finally the grade from tbody/tr/td itself.
Next I need to populate all the entries under each category while including those values (like shown above).
But... As you can see, the the data I was given is very loosely structured. A category is simply a td and after that come all the items in that category. To make the things worse, in my datasets, the number of items under each category varies between 1 and 100...
I've tried a few approaches but just can't seem to get it. Any help is greatly appreciated. I know that XSLT 2.0 introduces xsl:for-each-group, but I am limited to XSLT 1.0.
In this case, you are not actually grouping elements. It is more like ungrouping them.
One way to do this is to use an xsl:key to look up the "header" row for each of detail rows.
<xsl:key name="fruity"
match="tr[not(td[@class='header'])]"
use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>
i.e For each detail row, get the most previous header row.
Next, you can then match all your header rows like so:
<xsl:apply-templates select="tr/td[@class='header']"/>
Within the matching template, you could then extract the type, country and rank. Then to get the associated detail rows, it is a simple case of looking at the key for the parent row:
<xsl:apply-templates select="key('fruity', generate-id(..))">
Here is the overall XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="fruity"
match="tr[not(td[@class='header'])]"
use="generate-id(preceding-sibling::tr[td[@class='header']][1])"/>
<xsl:template match="/tbody">
<data>
<!-- Match header rows -->
<xsl:apply-templates select="tr/td[@class='header']"/>
</data>
</xsl:template>
<xsl:template match="td">
<!-- Match associated detail rows -->
<xsl:apply-templates select="key('fruity', generate-id(..))">
<!-- Extract relevant parameters from the td cell -->
<xsl:with-param name="type" select="substring-before(substring-after(img[1]/@src, 'images/icon_'), '.gif')"/>
<xsl:with-param name="country" select="img[2]/@alt"/>
<xsl:with-param name="rank" select="normalize-space(text())"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="tr">
<xsl:param name="type"/>
<xsl:param name="country"/>
<xsl:param name="rank"/>
<entry>
<type>
<xsl:value-of select="$type"/>
</type>
<country>
<xsl:value-of select="$country"/>
</country>
<rank>
<xsl:value-of select="$rank"/>
</rank>
<color>
<xsl:value-of select="td[1]"/>
</color>
<shape>
<xsl:value-of select="td[2]"/>
</shape>
<taste>
<xsl:value-of select="td[3]"/>
</taste>
</entry>
</xsl:template>
</xsl:stylesheet>
When applied to your input document, the following output is generated:
<data>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Green</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Red</color>
<shape>Round shaped</shape>
<taste>Bitter</taste>
</entry>
<entry>
<type>apples</type>
<country>Portugal</country>
<rank>First Grade</rank>
<color>Pink</color>
<shape>Round shaped</shape>
<taste>Tasty</taste>
</entry>
<entry>
<type>strawberries</type>
<country>USA</country>
<rank>Fifth Grade</rank>
<color>Red</color>
<shape>Heart shaped</shape>
<taste>Super tasty</taste>
</entry>
<entry>
<type>bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Yellow</color>
<shape>Smile shaped</shape>
<taste>Fairly tasty</taste>
</entry>
<entry>
<type>bananas</type>
<country>Congo</country>
<rank>Third Grade</rank>
<color>Brown</color>
<shape>Smile shaped</shape>
<taste>Too sweet</taste>
</entry>
</data>
精彩评论