I have written some Ruby code to import the Google n-gram data into a hash table, mapping word unigrams to their respective counts. I'm using symbols as opposed to strings for the keys. I've been running this code on a linux box for a while now with no problems. Running it on my Mac this morning yielded a symbol table ov开发者_如何学编程erflow runtime error after loading about 2 million key-value pairs. I don't understand what is causing this error. Anyone have suggestions on what might be the cause? I'm running Ruby 1.9.1 under OS X 10.5.8.
While using Symbol for keys instead of String is generally more efficient, the amount of efficiency gained is proportionate to the level of duplication involved. Since your keys are by definition unique, you should probably just use String keys to avoid jamming the Symbol table full of entries.
Is the difference 64-bit bs. 32-bit ruby? I suspect this because of your observation
yielded a symbol table overflow runtime error after loading about 2 million key-value pairs
If this is the case then you can do nothing about it but using a native 64-bit build of ruby if strings are not an option due to application design. Otherwise you'll have to go with strings. Conversion is easy:
:symbol.to_s == "symbol"
"symbol".to_sym == :symbol
精彩评论