hey guys, a german weather website provides a weather widget for owners of websites. This widget works fine with german Umlaute like äöü. However this widget is badly designed and so I'm using curl and xpath to query the information this weather widget provides. The weather widget is a set of tables and divs with inline styles and I'm using xpath to just get the values inside of the table td's.
Everythi开发者_运维问答ng works fine except german Umlaute like äöü. My website is using utf-8 encoding and so all those Umlaute should work correctly (and they do on the rest of the page). Even when i place the weather widget normally on my website the widget works with those Umlaute.
However as soon as I use curl to get the values inside of the table the Umlaute don't work and get converted into weird characters.
<?php
$url = 'http://www.weatherxyz.com/hptool/wordpress_v1.php?cid=43Xv1a0&l=de';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$str = curl_exec($curl);
$dom = new DOMDocument;
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
$tds = $xpath->query('//div/table/tr/td');
foreach ($tds as $key => $cell) {
echo $cell->textContent;
}
?>
Have you guys any idea how i can make this work?
Looks like you're not alone in griping about DOMDocument
not understanding different encodings. The specific poster includes SmartDOMDocument to undo some of its poor implementation.
Check the page encoding, and re-encode accordingly to utf8
精彩评论