开发者

How to save a dictionary containing utf-8 characters as its keys to a file with cPickle Python?

开发者 https://www.devze.com 2023-02-15 17:49 出处:网络
I want to know How to save a dictionary containing utf-8 characters as its keys to a file in Python with cPickle? this dictionary is very large and I\'ve heard that cPickle is much faster than pickle.

I want to know How to save a dictionary containing utf-8 characters as its keys to a file in Python with cPickle? this dictionary is very large and I've heard that cPickle is much faster than pickle. Also I suppose having utf-8 encoded keys is also problematic. Any other fast solutions are also welcome. here is what I do and below is the error message:

unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict


dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()

error message:

Traceback (most recent call last):
  File "Generate_NGram.py", line 81, in <module>
    save_ngram_dict(unique_ngrams_dict)
  File "Generate_NGram.py", line 70, in save_ngram_dict
    cPickle.dump(unique_ngrams_dict,dict_file)
  File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
  开发者_JAVA百科  raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects

thanks


  1. Pickle is a binary format, so you shouldn't open the file with any codecs, just:

    file('ngram_dict', 'w')
    

    It's not a reason it's failing, just quite inefficient.

  2. The actual problem is the object you are trying to save contains a function reference (the default value lambda: 0) and pickle format does not support serializing functions.

    You'll have three options:

    1. Use a regular dict and use it's .get method with default argument.
    2. Set

      unique_ngrams_dict.default_factory = None
      

      before pickling and set it back to

      unique_ngrams_dict.default_factory = lambda: 0
      

      after unpickling.

    3. Define a class like:

      class NgramDefault:
          def __call__():
              return 0
      

      and use NgramDefault() as the default factory instead of lambda: 0.


You should just do it and trust the pickle module to do the right thing. The best way to treat pickle is as an opaque blob of stuff that will magically re-create the exact data structure you started with when you unpickle it.

Don't try to apply any sort of encoding to the output of pickle, it should be treated as a binary blob. If you have unicode elements when you pickle, they will be unicode once you unpickle.

0

精彩评论

暂无评论...
验证码 换一张
取 消