I need to scrape some website data from a table on a website and create an XML document that will be used by an app.
The table looks like this:
<table id="results" class="results">
<thead>
<tr>
<th scope="col" class="resRoute">Route</th>
<th scope="col" class="resDir">To</th>
<th scope="col" class="resDue sorted">Time</th>
</tr>
</thead>
<tbody>
<tr>
<td class="resRoute">263</td>
<td class="resDir">开发者_开发问答;Route Name</td>
<td class="resDue">1 min</td>
</tr>
<tr>
<td class="resRoute">17</td>
<td class="resDir">Route Name</td>
<td class="resDue">2 min</td>
</tr>
</tbody>
</table>
And I would like to create an XML feed that looks like this:
<train>
<route>263</route>
<direction>Route Name</direction>
<due>2 Min</due>
</train>
<train>
<route>17</route>
<direction>Route Name</direction>
<due>12 Min</due>
</train>
Run it through a XSLT transform:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="table/tbody/tr">
<train>
<route><xsl:value-of select="td[@class='resRoute']" /></route>
<direction><xsl:value-of select="td[@class='resDir']" /></direction>
<due><xsl:value-of select="td[@class='resDue']" /></due>
</train>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Hack hackedy hack hack hack!
$html = '<table id="results" class="results">
<thead>
<tr>
<th scope="col" class="resRoute">Route</th>
<th scope="col" class="resDir">To</th>
<th scope="col" class="resDue sorted">Time</th>
</tr>
</thead>
<tbody>
<tr>
<td class="resRoute">263</td>
<td class="resDir">Route Name</td>
<td class="resDue">1 min</td>
</tr>
<tr>
<td class="resRoute">17</td>
<td class="resDir">Route Name</td>
<td class="resDue">2 min</td>
</tr>
</tbody>
</table>
';
$body = explode('<tbody>', $html);
$xml = simplexml_load_string("<?xml version='1.0' encoding='utf-8'?><xml />");
$rows = array();
foreach (array_slice(explode('<tr>', end($body)), 1) as $row)
{
preg_match('/resRoute">([0-9]+)<\/td>/', $row, $ids);
preg_match('/resDir">([^<]+)<\/td>/', $row, $dir);
preg_match('/resDue">([^<]+)<\/td>/', $row, $due);
$node = $xml->addChild('train');
$node->addChild('route', $ids[1]);
$node->addChild('direction', $dir[1]);
$node->addChild('due', $due[1]);
}
header('Content-Type: text/xml');
echo $xml->asXML();
精彩评论