开发者

Strange character in XML document

开发者 https://www.devze.com 2023-01-24 06:22 出处:网络
I have a strange character showing up on my rss fee开发者_JS百科d. On firefox, it looks like a box with four numbers in each corner, in some cases, 0 - 0 - 9 - 4, in other 0 - 0 - 9 - 2.

I have a strange character showing up on my rss fee开发者_JS百科d. On firefox, it looks like a box with four numbers in each corner, in some cases, 0 - 0 - 9 - 4, in other 0 - 0 - 9 - 2.

These are appearing where smart quotes should be.

I'm familiar with the black diamond with the question mark, but this is a new one.


The 0-0-9-4 indicates that the character was 0x0094, which is an unassigned UTF-16 character. Whatever is producing the feed is inserting characters for which your browser has no font mapping, or possibly the character-encoding specified in the header doesn't match the stream contents.


Ah, okay. You pointed my in the right direction. What was coming up was Windows entities. People put stuff into our database in a complex series of steps converting from Word, to InDesign, to GoLive (yes, it is painful).

Anyway, what the database was popping out was these entities like '’', which mean something I guess to windows, but nothing to my browser, in either ISO-8859-1 or UTF-8, so no amount of changing my page encoding could fix that nonsense. Though, oddly, it just appeared here correctly, so I don't know what I'm doing wrong.

So anyway, I fixed it by running everything through this php function before displaying it.

function fixChars($text){

    // Next, replace their Windows-1252 equivalents.
    $text = str_replace(
    array('‘', '’', '“', '”', '•', '—', '…'),
    array("'", "'", '"', '"', '-', '--', '...'),
    $text); 

    return $text;

}

So, now things seem fine.

Thanks for the direction all.

0

精彩评论

暂无评论...
验证码 换一张
取 消