开发者

How do I handle this response from YQL

开发者 https://www.devze.com 2023-03-05 03:45 出处:网络
In a request to YQL (select * from html where url=\"...\") I got the following response: callback({ \"query\":

In a request to YQL (select * from html where url="...") I got the following response:

callback({
    "query":
        {"count":"1","created":"2011-05-09T23:29:05Z","lang":"en-US"
     }, "results": ["<body>... we\ufffdll call Mr ...&开发者_开发问答lt;/body>"]
}

This is from the YQL console page. When I type that sequence into firebug (even on YQL's page) I get:

... we�ll call Mr ...

What am I doing wrong? Is YQL's site in a bad encoding? Is there some way to convert symbols like this to their ascii equivalent?

BTW this isn't my site so it's not like I can change the meta charset on that site


  • It seems like that (the question mark in a solid black diamond) is what you should be seeing: http://www.fileformat.info/info/unicode/char/fffd/browsertest.htm

  • The comment on that character's page says:

    used to replace an incoming character whose value is unknown or unrepresentable in Unicode

Maybe the answers to these might help get a better answer:

  1. What character are you expecting at that place?
  2. Can you post the URL that you're scraping?
  3. Is that the character on that page also or is it getting mangled when picked up by YQL?

Update

You might want to check out the charset option in the where clause of your YQL query - I'm not entirely sure what it does but it looks like it forces the YQL engine to use the specified charset when parsing the page. Perhaps setting it to UTF-8 will solve your problem.

For example,

select * from html where url = 'http://google.com' and charset='utf-8'
0

精彩评论

暂无评论...
验证码 换一张
取 消