I'm trying to figure out some C code so that I can port it into python. The code is for reading a proprietary binary data file format. It has been straightforward thus far -- it's mainly been structs and I have been using the struct
library to ask for particular ctypes from the file. However, I just came up on this bit of code and I'm at a loss for how to implement it in python. In particular, I'm not sure how to deal with the enum
or the union
.
#define BYTE char
#define UBYTE unsigned char
#define WORD 开发者_Python百科short
#define UWORD unsigned short
typedef enum {
TEEG_EVENT_TAB1=1,
TEEG_EVENT_TAB2=2
} TEEG_TYPE;
typedef struct
{
TEEG_TYPE Teeg;
long Size;
union
{
void *Ptr; // Memory pointer
long Offset
};
} TEEG;
Secondly, in the below struct definition, I'm not sure what the colons after the variable names mean, (e.g., KeyPad:4
). Does it mean I'm supposed to read 4 bytes?
typedef struct
{
UWORD StimType;
UBYTE KeyBoard;
UBYTE KeyPad:4;
UBYTE Accept:4;
long Offset;
} EVENT1;
In case it's useful, an abstract example of the way I've been accessing the file in python is as follows:
from struct import unpack, calcsize def get(ctype, size=1): """Reads and unpacks binary data into the desired ctype.""" if size == 1: size = '' else: size = str(size) chunk = file.read(calcsize(size + ctype)) return unpack(size + ctype, chunk)[0] file = open("file.bin", "rb") file.seek(1234) var1 = get('i') var2 = get('4l') var3 = get('10s')
Enums: There are no enums in the language. Various idioms have been proposed, but none is really widespread. The most straightforward (and in this case sufficient) solution is
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2
Unions: ctypes has unions.
The fieldname : n
syntax is called a bitfield and, yeah, does mean "this is n bits big". Again, ctypes has them.
I don't know the answer to all of your question, but for enums that you do not need a lookup-by-value on, (is, just using it to avoid magic numbers), I like to use a small class. A regular dict is another option that works fine. If you need lookup-by-value, you may want another structure though.
class TeegType(object):
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2
print TeegType.TEEG_EVENT_TAB1
What you really need to know is:
- What is the size of an enum?. You will use this answer to generate your unpacking code.
- What is the size of a union?. Summary: the size of the largest member.
- How do you deal with that pointer? You should take a look at the
ctypes
module. For what you are doing, it may be easier to work with than thestruct
module. In particular, it can work with pointers arriving via C. - How do you coerce/cast the data read from the struct into the right type to work with in python? This is why I recommended
ctypes
in the bullet above; this module has functions for performing the necessary casts.
The C enum
declaration is a syntactic wrapper around some integer type. See Is the sizeof(enum) == sizeof(int), always?. How big an int
is will depend on the particular C compiler. I would probably start by trying 16 bits.
The union
reserves a block of memory the size of the largest of the contained data types. Again, the exact size will depend on the C implementation, but I would expect 32 bits for a 32-bit architecture, or 64-bits if this is compiled as native 64-bit code. Generally speaking, you will be able to store the contents of the union in a Python integer or long, regardless of whether what has been saved in it is a pointer or an offset.
A more interesting question is why a pointer would ever be written to a disk file. You may find that the union
field is only treated as a pointer when the TEEG
struct
is in memory, but when written to disk, it is always an integer offset.
As for the :4 notation, as several people have noted, these are "bit fields," meaning a sequence of bits, several of which can be packed into a single space. If I recall correctly, bitfields in C are packed into int
s, so both of these 4-bit fields will be packed into a single integer. They can be unpacked with appropriate use of Python's "&" (bitwise and) and ">>" (right shift) operators. Again, exactly how the fields have been packed into the integer, and the size of the integer field itself, will depend on the particular C implementation.
Maybe the following code snippet will help you:
SIZEOF_TEEG_TYPE = 2 # First guess for enum is two bytes
FMT_TEEG_TYPE = "h" # Could be "b", "B", "h", "H", "l", "L", "q" or "Q"
SIZEOF_LONG = 4 # Use 8 in 64-bit Unix architectures
FMT_LONG = "l" # Use "q" in 64-bit Unix architectures
# Life gets more interesting if you are reading 64-bit
# using 32-bit Python
SIZEOF_PTR_LONG_UNION = 4 # Use 8 in any 64-bit architecture
FMT_PTR_LONG_UNION = "l" # Use "q" in any 64-bit architecture
# Life gets more interesting if you are reading 64-bit
# using 32-bit Python
SIZEOF_TEEG_STRUCT = SIZEOF_TEEG_TYPE + SIZEOF_LONG + SIZEOF_PTR_LONG_UNION
FMT_TEEG_STRUCT = FMT_TEEG_TYPE + FMT_LONG + FMT_PTR_LONG_UNION
# Constants for TEEG_EVENTs
TEEG_EVENT_TAB1 = 1
TEEG_EVENT_TAB2 = 2
.
.
.
# Read a TEEG structure
teeg_raw = file_handle.read( SIZEOF_TEEG_STRUCT )
teeg_type, teeg_size, teeg_offset = struct.unpack( FMT_TEEG_STRUCT, teeg_raw )
.
.
.
# Use TEEG_TYPE information
if teeg_type == TEEG_EVENT_TAB1:
Do something useful
elif teeg_type == TEEG_EVENT_TAB2:
Do something else useful
else:
raise ValueError( "Encountered illegal TEEG_EVENT type %d" % teeg_type )
精彩评论