开发者

Does Python Pickle have an illegal character/sequence I can use as a separator?

开发者 https://www.devze.com 2023-01-21 00:55 出处:网络
I want to make (and decode) a single string composed of several python pickles. Is there a character or sequence that is safe to use as a separator in this string?

I want to make (and decode) a single string composed of several python pickles.

Is there a character or sequence that is safe to use as a separator in this string?

I should be able to make the string like so:

s = pickle.dumps(o1) + PICKLE_SEPARATOR + pickle.dumps(o2) + PICKLE_SEPARATOR + pickle.dumps(o3) ...
开发者_开发知识库

I should be able to take this string and reconstruct the objects like so:

[pickle.loads(s) for s in input.split(PICKLE_SEPARATOR)]

What should PICKLE_SEPARATOR be?


For the curious, I want to send pickled objects to redis using APPEND. (though perhaps I'll just use RPUSH)


It's fine to just catenate the pickles together, Python knows where each one ends

>>> import cStringIO as stringio
>>> import cPickle as pickle
>>> o1 = {}
>>> o2 = []
>>> o3 = ()
>>> p = pickle.dumps(o1)+pickle.dumps(o2)+pickle.dumps(o3)
>>> s = stringio.StringIO(p)
>>> pickle.load(s)
{}
>>> pickle.load(s)
[]
>>> pickle.load(s)
()


EDIT: First consider gnibbler's answer, which is obviously much simpler. The only reason to prefer the one below is if you want to be able split a sequence of pickles without parsing them.

A fairly safe bet is to use a brand new UUID that you never reuse anywhere else. Evaluate uuid.uuid4().bytes once and store the result in your code as the separator. E.g.:

>>> import uuid
>>> uuid.uuid4().bytes
'\xae\x9fW\xff\x19cG\x0c\xb1\xe1\x1aV%P\xb7\xa8'

Then copy-paste the resulting string literal into your code as the separator (or even just use the one above, if you want). It is pretty much guaranteed that the same sequence will never occur in anything you ever want to store.


I don't use Python much, but is there a reason you couldn't just pickle an array instead? So pickling becomes

s = pickle.dumps([o1,o2,o3])

and reconstruction becomes

objs = pickle.loads(s)

Edit 1: Also, according to this answer, pickled output is self-terminating; thus, you could pickle with

s = ''.join(map(pickle.dumps,[o1,o2,o3]))

and restore with

import StringIO
sio = StringIO.StringIO(s)
objs = []
try:
    while True: objs.append(pickle.load(sio))
catch EOFError:
    pass

I'm not sure there's a benefit to this, though. (Though I didn't see one, there may well be a better way than that nasty loop/exception combo; like I said, I don't use Python much.)


In Python 3 it can be done using BytesIO:

from io import BytesIO
import pickle

o1 = {}
o2 = []
o3 = ()
p = pickle.dumps(o1) + pickle.dumps(o2) + pickle.dumps(o3)
s = BytesIO(p)

while True:
    try:
        print(pickle.load(s))
    except EOFError:
        break

Prints:

{} 
[]
()

To answer the original question. There is no easy way to separate concatenated pickled objects. But fortunately you don't have to. pickle.load will return the objects one after another with each successive call.


One solution would be to prepend your string of pickles with data about how many characters each constituent element contains.

0

精彩评论

暂无评论...
验证码 换一张
取 消