开发者

Working around unexpected behavior in yaml for Ruby -- interned unicode strings

开发者 https://www.devze.com 2023-01-15 14:33 出处:网络
(1.9 on Windows) Reproducing: require \'yaml\' s = YAML::load(\"\\xEC\\x86\\x8C\\xEB\\x85\\x80\\xEC\\x8B\\x9C\\xEB\\x8C\\x80\")

(1.9 on Windows)

Reproducing:

require 'yaml'

s = YAML::load("\xEC\x86\x8C\xEB\x85\x80\xEC\x8B\x9C\xEB\x8C\x80")
   # => "∞åîδàÇ∞ï£δîÇ" or "소녀시대", depending on your terminal's unicode support
s_interned = s.intern

s_interned.class    # => Symbol

s_yamld = s_interned.to_yaml
   # =>开发者_JS百科; "--- \":\\xEC\\x86\\x8C\\xEB\\x85\\x80\\xEC\\x8B\\x9C\\xEB\\x8C\\x80\"\n"
unyamld = YAML::load(s_yamld)
   # => ":∞åîδàÇ∞ï£δîÇ" or ":소녀시대"

unyamld.class       # => String
                    # => expected: Symbol

And once again:

YAML::load(s_interned.to_yaml).class   # => String

Here's how a "normal" symbol behaves:

YAML::load(:foo.to_yaml).class         # => Symbol

Normal symbols behave fine, but symbols with unicode characters don't seem to. They get interpreted as strings with a colon as their first character.

I'm pretty sure this script was working last night. But I woke up this morning and everything is gone wrong.

Does anyone know how I can resolve this or get around this?

I've tried using some clever regular expression/sub hacks to get around this and "reconvert", but they've all proven inelegant or have made the situation worse.


I'm new to 1.9 as well but it seems you have to add the encoding to the top of the file sometimes. Something like:

# encoding: utf-8

Again... no idea when or why. Still have to learn how it works in 1.9. I found some more background information here: "Ruby 1.9 Common Problems Pt. 1: Encoding".

0

精彩评论

暂无评论...
验证码 换一张
取 消