开发者

How can I prevent strange characters when pulling the atom feed from a wordpress 3.0 blog

开发者 https://www.devze.com 2023-01-17 22:59 出处:网络
I have an atom feed on a wordpress blog here: http://blogs.legalview.info/auto-accidents/feed/atom When I download the text of the file and display it on my site, I get strange charactes like the acc

I have an atom feed on a wordpress blog here: http://blogs.legalview.info/auto-accidents/feed/atom

When I download the text of the file and display it on my site, I get strange charactes like the accented 'A' here:

Recent studies are showing that car accident -related fatalities have declined almost 10% since 2008. The reason for this

I am using the following code in my C# web application to download the feed:

        WebClient 开发者_如何学Cclient = new WebClient();
        client.Headers.Add(@"Accept-Language: en-US,en          
                           Accept-Charset: utf-8");
        string xml_text = client.DownloadString(_atom_url);

And xml_text.Contains("Â") returns true, but if I download the feed in my browser no such  exists. I'm pretty sure this is a character set issue, but I can't figure out why. By examining client.ResponseHeaders, I can see it is in fact downloading text in utf-8, and the response on my .Net site is UTF-8 as well, so I can't figure out why the weirdness appears


I get ...fatalities when I force my browser to interpret the feed as ISO-8859-1 instead of UTF-8 (which definitely is the correct character set for the feed.)

I'm pretty sure either your WebClient somehow defaults to ISO-8859-1, or the output encoding on your site is ISO-8859-1, which obviously garbles the UTF-8 input.

Maybe start checking your site's output first. If that definitely is UTF-8, take a look at the WebClient.

0

精彩评论

暂无评论...
验证码 换一张
取 消