Ever since I read Dave Beazley's post on binary I/O handling (http://dabeaz.blogspot.com/2009/08/python-binary-io-handling.html) I've wanted to create a Python library for a certain wire protocol. However, I can't find the best solution for variable length structures. Here's what I want to do:
import ctypes as c
class Point(c.Structure):
_fields_ = [
('x',c.c_double),
('y',c.c_double),
('z',c.c_double)
]
class Points(c.Structure):
_fields_ = [
('num_points', c.c_uint32),
('points', Point*num_points) # num_points not yet defined!
]
The class Points
won't work since num_points
isn't defined yet. I could redefine the _fields_
variable later once num_points
is known, but since it's a class variable it would effect all of the othe开发者_JS百科r Points
instances.
What is a pythonic solution to this problem?
The most straightforward way, with the example you gave is to define the structure just when you have the information you need.
A simple way of doing that is creating the class at the point you will use it, not at module root - you can, for example, just put the class
body inside a function, that will act as a factory - I think that is the most readable way.
import ctypes as c
class Point(c.Structure):
_fields_ = [
('x',c.c_double),
('y',c.c_double),
('z',c.c_double)
]
def points_factory(num_points):
class Points(c.Structure):
_fields_ = [
('num_points', c.c_uint32),
('points', Point*num_points)
]
return Points
#and when you need it in the code:
Points = points_factory(5)
Sorry - It is the C code that will "fill in" the values for you - that is not the answer them. WIll post another way.
And now, for something completly different - If all you need is dealing with the Data, possibly the "most Pythonic" way is not trying to use ctypes to handle raw data in memory at all.
This approach just uses struct.pack and .unpack to serialiase/unserialize teh data as it moves on/off the your app. The "Points" class can accept the raw bytes, and creates python objects from that, and can serialize the data trough a "get_data" method. Otherwise, it is just am ordinary python list.
import struct
class Point(object):
def __init__(self, x=0.0, y=0.0, z= 0.0):
self.x, self.y, self.z = x,y,z
def get_data(self):
return struct.pack("ddd", self.x, self.y, self.z)
class Points(list):
def __init__(self, data=None):
if data is None:
return
pointsize = struct.calcsize("ddd")
for index in xrange(struct.calcsize("i"), len(data) - struct.calcsize("i"), pointsize):
point_data = struct.unpack("ddd", data[index: index + pointsize])
self.append(Point(*point_data))
def get_data(self):
return struct.pack("i", len(self)) + "".join(p.get_data() for p in self)
This question is really, really, old:
I have a simpler answer, which seems strange, but avoids metaclasses and resolves the issue that ctypes doesn't allow me to directly build a struct with the same definition as I can in C.
The example C struct, coming from the kernel:
struct some_struct {
__u32 static;
__u64 another_static;
__u32 len;
__u8 data[0];
};
With ctypes implementation:
import ctypes
import copy
class StructureVariableSized(ctypes.Structure):
_variable_sized_ = []
def __new__(self, variable_sized=(), **kwargs):
def name_builder(name, variable_sized):
for variable_sized_field_name, variable_size in variable_sized:
name += variable_sized_field_name.title() + '[{0}]'.format(variable_size)
return name
local_fields = copy.deepcopy(self._fields_)
for matching_field_name, matching_type in self._variable_sized_:
match_type = None
for variable_sized_field_name, variable_size in variable_sized:
if variable_sized_field_name == matching_field_name:
match_type = matching_type
break
if match_type is None:
raise Exception
local_fields.append((variable_sized_field_name, match_type*variable_size))
name = name_builder(self.__name__, variable_sized)
class BaseCtypesStruct(ctypes.Structure):
_fields_ = local_fields
_variable_sized_ = self._variable_sized_
classdef = BaseCtypesStruct
classdef.__name__ = name
return BaseCtypesStruct(**kwargs)
class StructwithVariableArrayLength(StructureVariableSized):
_fields_ = [
('static', ctypes.c_uint32),
('another_static', ctypes.c_uint64),
('len', ctypes.c_uint32),
]
_variable_sized_ = [
('data', ctypes.c_uint8)
]
struct_map = {
1: StructwithVariableArrayLength
}
sval32 = struct_map[1](variable_sized=(('data', 32),),)
print sval32
print sval32.data
sval128 = struct_map[1](variable_sized=(('data', 128),),)
print sval128
print sval128.data
With sample output:
machine:~ user$ python svs.py
<__main__.StructwithVariableArrayLengthData[32] object at 0x10dae07a0>
<__main__.c_ubyte_Array_32 object at 0x10dae0830>
<__main__.StructwithVariableArrayLengthData[128] object at 0x10dae0830>
<__main__.c_ubyte_Array_128 object at 0x10dae08c0>
This answer works for me for a couple reasons:
- The argument to the constructor can be pickled, and has no references to types.
- I define all of the structure inside of the StructwithVariableArrayLength definition.
- To the caller, the structure looks identical as if I had just defined the array inside of _fields_
- I have no ability to modify the underlying structure defined in the header file, and accomplish my goals without changing any underlying code.
- I don't have to modify any parse/pack logic, this only does what I'm trying to do which is build a class definition with a variable length array.
- This is a generic, reusable container that be sent into the factory like my other structures.
I would obviously prefer the header file took a pointer, but that isn't always possible. That answer was frustrating. The others were very tailored to the data structure itself, or required modification of the caller.
So, just as in C, you can't do exactly what you do want. The only useful way of working with a structure that does what you want in C is to have it as
struct Points {
int num_points;
Point *points;
}
And have utility code to alloc you memory where you can put your data. Unless you have some safe maxsize, and don't want to bother with that part of the code (memory allocation) - the network part of the code would then transmit just the needed data from within the structure, not the whole of it.
To work with Python ctypes with a structure member which actually contains a pointer to where your data is (and so, may be of variable length) - you will also have to alloc and free memory manually (if you are filling it on the python side) - or just read the data - f creating and destroying the data is done on native code functions.
The structure creating code can be thus:
import ctypes as c
class Point(c.Structure):
_fields_ = [
('x',c.c_double),
('y',c.c_double),
('z',c.c_double)
]
class Points(c.Structure):
_fields_ = [
('num_points', c.c_uint32),
('points', c.POINTER(Point))
]
And the code to manage the creation and deletion of these data structures can be:
__all_buffers = {}
def make_points(num_points):
data = Points()
data.num_points = num_points
buf = c.create_string_buffer(c.sizeof(Point) * num_points)
__all_buffers[c.addressof(buf)] = buf
p = Point.from_address(c.addressof(buf))
data.points = c.pointer(p)
return data
def del_points(points):
del __all_buffers[c.addressof(m.points[0])
points.num_points = 0
The use f the global variable "__all_buffers" keep a reference to the
python-created buffer object so that python does not destroy it upon
leaving the make_points structure. An alternative to this is to get a reference to
either libc (on unixes) or winapi,and call system's malloc
and free
functions yourself
OR - you can just go with plain old "struct" Python module, instead of using ctypes - doubly so if you will have no C code at all, and are just using ctypes for the "structs" convenience.
Here's what I've come up with so far (still a little rough):
import ctypes as c
MAX_PACKET_SIZE = 8*1024
MAX_SIZE = 10
class Points(c.Structure):
_fields_ = [
('_buffer', c.c_byte*MAX_PACKET_SIZE)
]
_inner_fields = [
('num_points', c.c_uint32),
('points', 'Point*self.num_points')
]
def __init__(self):
self.num_points = 0
self.points = [0,]*MAX_SIZE
def parse(self):
fields = []
for name, ctype in self._inner_fields:
if type(ctype) == str:
ctype = eval(ctype)
fields.append((name, ctype))
class Inner(c.Structure, PrettyPrinter):
_fields_ = fields
inner = Inner.from_address(c.addressof(self._buffer))
setattr(self, name, getattr(inner, name))
self = inner
return self
def pack(self):
fields = []
for name, ctype in self._inner_fields:
if type(ctype) == str:
ctype = eval(ctype)
fields.append((name, ctype))
class Inner(c.Structure, PrettyPrinter):
_fields_ = fields
inner = Inner()
for name, ctype in self._inner_fields:
value = getattr(self, name)
if type(value) == list:
l = getattr(inner, name)
for i in range(len(l)):
l[i] = getattr(self, name)[i]
else:
setattr(inner, name, value)
return inner
The methods parse
and pack
are generic, so they could be moved to a metaclass. This would make it's use almost as easy as the snippet I first posted.
Comments on this solution? Still looking for something simpler, not sure if it exists.
You could use ctypes pointers to do this.
C struct
struct some_struct {
uint length;
uchar data[1];
};
Python code
from ctypes import *
class SomeStruct(Structure):
_fields_ = [('length', c_uint), ('data', c_ubyte)]
#read data into SomeStruct
s = SomeStruct()
ptr_data = pointer(s.data)
for i in range(s.length):
print ptr_data[i]
If you're willing to consider a third-party package, you might be able to use Construct.
Let's take the structure you've provided:
import ctypes
class CPoint(ctypes.Structure):
_fields_ = [
('x',ctypes.c_double),
('y',ctypes.c_double),
('z',ctypes.c_double)
]
Using Construct's syntax, we would define the equivalent as follows:
import construct
Point = construct.Struct(
"x" / construct.Float64l,
"y" / construct.Float64l,
"z" / construct.Float64l
)
We can check that they are the same:
>>> point_coordinates = {"x": 3.14, "y": 2.71, "z": 1.41}
>>> c_point = CPoint(**point_coordinates)
>>> point = Point.build(point_coordinates)
>>> bytes(c_point) == bytes(point)
True
Now we define the Points
structure according to the Construct syntax:
Points = construct.Struct(
"num_points" / construct.Int32ul,
"points" / construct.Array(construct.this.num_points, Point)
)
Construct will automatically create an array of points
based on num_points
.
We can serialize a Points
structure:
>>> Points.build({"num_points": 2, "points": [{"x": 3.14, "y": 2.71, "z": 1.41}, {"x": 1.73, "y": 1.20, "z": 1.61}]})
b'\x02\x00\x00\x00\x1f\x85\xebQ\xb8\x1e\t@\xaeG\xe1z\x14\xae\x05@\x8f\xc2\xf5(\\\x8f\xf6?\xaeG\xe1z\x14\xae\xfb?333333\xf3?\xc3\xf5(\\\x8f\xc2\xf9?'
Or de-serialize it:
>>> res = Points.parse(b'\x02\x00\x00\x00\x1f\x85\xebQ\xb8\x1e\t@\xaeG\xe1z\x14\xae\x05@\x8f\xc2\xf5(\\\x8f\xf6?\xaeG\xe1z\x14\xae\xfb?333333\xf3?\xc3\xf5(\\\x8f\xc2\xf9?')
>>> print(res)
Container:
num_points = 2
points = ListContainer:
Container:
x = 3.14
y = 2.71
z = 1.41
Container:
x = 1.73
y = 1.2
z = 1.61
And of course access the structure fields:
>>> for i in range(res.num_points):
... print(res.points[i].x)
...
3.14
1.73
精彩评论