开发者

YAML Encoding of Malformed String, Model Serialization Issues

开发者 https://www.devze.com 2022-12-08 22:12 出处:网络
I\'ve isolated a problem with Ruby on Rails where a model with a serialized column is not properly loading data that has been saved to it.

I've isolated a problem with Ruby on Rails where a model with a serialized column is not properly loading data that has been saved to it.

What goes in is a Hash, and what comes out is a YAML string that can't be parsed due to formatting issues. I'd expect that a serializer can properly store and retrieve anything you give it, so something appears to have gone wrong.

The troublesome string in question is formatted something like this:

message_text = <<END

  X
X
END

yaml = message_text.to_yaml

puts yaml
# =>
# --- |
#
#   X
# X

puts YAML.load(yaml)
# => ArgumentError: syntax error on line 3, col 0: ‘X’

The combination of newline, indented second line, and non-indented third line causes the parser to fail. Omitting either the blank line or the indentation appears to remedy the problem, but this does seem to be a bug in the serialization process. Since it requires a rather unique set of circumstances, I'm willing to bet this is some strange edge-case that isn't properly handled.

The YAML module that ships with Ruby and is used by Rails looks to delegate a large portion of the processing to Syck, yet does provide Syck with some hints as to how to encode the data it is sending.

In yaml/rubytypes.rb there's the String#to_yaml definition:

class String
  def to_yaml( opts = {} )
    YAML::quick_emit( is_complex_yaml? ? self : nil, opts ) do |out|
      if is_binary_data?
        out.scalar( "tag:yaml.org,2002:binary", [self].pack("m"), :literal )
      elsif to_yaml_properties.empty?
        out开发者_如何学C.scalar( taguri, self, self =~ /^:/ ? :quote2 : to_yaml_style )
      else
        out.map( taguri, to_yaml_style ) do |map|
          map.add( 'str', "#{self}" )
          to_yaml_properties.each do |m|
            map.add( m, instance_variable_get( m ) )
          end
        end
      end
    end
  end
end

There appears to be a check there for strings that start with ':' and could be confused as Symbol when de-serializing, and the :quote2 option should be an indication to quote it during the encoding process. Adjusting this regular expression to catch the conditions described above does not appear to have any effect on the output, so I'm hoping someone more familiar with the YAML implementation can advise.


Yep, that looks like a bug in the C syck library. I checked it out using the PHP syck bindings (v 0.9.3): http://pecl.php.net/package/syck and the same bug is present, indicating it is a bug in the library as opposed to the ruby yaml library or ruby-syck bindings:

// phptestsyck.php
<?php
$message_text = "

  X
X
";

syck_load(syck_dump($message_text));
?>

Running this on the cli gives the same SyckException:

$ php phptestsyck.php 
PHP Fatal error:  Uncaught exception 'SyckException' with message 'syntax error on line 5, col 0: 'X'' in /.../phptestsyck.php:8
Stack trace:
#0 /.../phptestsyck.php(8): syck_load('--- %YAML:1.0 >...')
#1 {main}
  thrown in /.../phptestsyck.php on line 8

So, I suppose you could try to fix Syck itself. It appears that the library hasn't been updated since v0.55 in May of 2005 (http://rubyforge.org/projects/syck/), though.

Alternately, there is a pure-ruby yaml parser called RbYAML (http://rbyaml.rubyforge.org/) which originated with JRuby that doesn't appear to have this bug:

>> require 'rbyaml'
=> true
>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> yaml = RbYAML.dump(message_text)
=> "--- "\\n  X\\nX\\n"\n"
>> RbYAML.load(yaml)
=> "\n  X\nX\n"
>> 

Finally, have you considered another serialization format altogether? Ruby's Marshal library doesn't have this bug either and is faster than Yaml (see http://significantbits.wordpress.com/2008/01/29/yaml-vs-marshal-performance/):

>> message_text = <<END

  X
X
END
=> "\n  X\nX\n"
>> marshal = Marshal.dump(message_text)
=> "\004\b"\f\n  X\nX\n"
>> Marshal.load(marshal)
=> "\n  X\nX\n"


You have to give up the easy serialize ActiveRecord::Base method to do so, but it's not hard otherwise to use your own serializing scheme. For example, to serialize some field called 'person_data':

class Person < ActiveRecord::Base
 def person_data
    self[:person_data] ? Marshal.load(self[:person_data]) : nil
  end

  def person_data=(x)
    self[:person_data] = Marshal.dump(x)
  end
end

## User Person#person_data as normal and it is transparently marshalled
p = Person.find 1
p.person_data = {:color => "blue", :food => "vegetarian"}

(See this ruby forum thread for more)

0

精彩评论

暂无评论...
验证码 换一张
取 消