开发者

How do I write a long integer as binary in Python?

开发者 https://www.devze.com 2023-02-03 16:31 出处:网络
In Python, long integers have unlimited precision. I would like to write a 16 byte (128 bit) integer to a file. struct from the standard library supports only up to 8 byte integers. array has the same

In Python, long integers have unlimited precision. I would like to write a 16 byte (128 bit) integer to a file. struct from the standard library supports only up to 8 byte integers. array has the same limitation. Is there a way to do this without masking and shifting each integer?

Some clarification here: I'm writing开发者_如何转开发 to a file that's going to be read in from non-Python programs, so pickle is out. All 128 bits are used.


I think for unsigned integers (and ignoring endianness) something like

import binascii

def binify(x):
    h = hex(x)[2:].rstrip('L')
    return binascii.unhexlify('0'*(32-len(h))+h)

>>> for i in 0, 1, 2**128-1:
...     print i, repr(binify(i))
... 
0 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
1 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
340282366920938463463374607431768211455 '\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

might technically satisfy the requirements of having non-Python-specific output, not using an explicit mask, and (I assume) not using any non-standard modules. Not particularly elegant, though.


Two possible solutions:

  1. Just pickle your long integer. This will write the integer in a special format which allows it to be read again, if this is all you want.

  2. Use the second code snippet in this answer to convert the long int to a big endian string (which can be easily changed to little endian if you prefer), and write this string to your file.

The problem is that the internal representation of bigints does not directly include the binary data you ask for.


The PyPi bitarray module in combination with the builtin bin() function seems like a good combination for a solution that is simple and flexible.

bytes = bitarray(bin(my_long)[2:]).tobytes()

The endianness can be controlled with a few more lines of code. You'll have to evaluate the efficiency.


Why not use struct with the unsigned long long type twice?

import struct
some_file.write(struct.pack("QQ", var/(2**64), var%(2**64)))

That's documented here (scroll down to get the table with Q): http://docs.python.org/library/struct.html


This may not avoid the "mask and shift each integer" requirement. I'm not sure that avoiding mask and shift means in the context of Python long values.

The bytes are these:

def bytes( long_int ):
    bytes = []
    while long_int != 0:
        b = long_int%256
        bytes.insert( 0, b )
        long_int //= 256
    return bytes

You can then pack this list of bytes using struct.pack( '16b', bytes )


With Python 3.2 and later, you can use int.to_bytes and int.from_bytes: https://docs.python.org/3/library/stdtypes.html#int.to_bytes


You could pickle the object to binary, use protocol buffers (I don't know if they allow you to serialize unlimited precision integers though) or BSON if you do not want to write code.

But writing a function that dumps 16 byte integers by shifting it should not be so hard to do if it's not time critical.


This may be a little late, but I don't see why you can't use struct:

bigint = 0xFEDCBA9876543210FEDCBA9876543210L
print bigint,hex(bigint).upper()

cbi = struct.pack("!QQ",bigint&0xFFFFFFFFFFFFFFFF,(bigint>>64)&0xFFFFFFFFFFFFFFFF)

print len(cbi)

The bigint by itself is rejected, but if you mask it with &0xFFFFFFFFFFFFFFFF you can reduce it to an 8 byte int instead of 16. Then the upper part is shifted and masked as well. You may have to play with byte ordering a bit. I used the ! mark to tell it to produce a network endian byte order. Also, the msb and lsb (upper and lower bytes) may need to be reversed. I will leave that as an exercise for the user to determine. I would say saving things as network endian would be safer so you always know what the endianess of your data is.

No, don't ask me if network endian is big or little endian...


Based on @DSM's answer, and to support negative integers and varying byte sizes, I've created the following improved snippet:

def to_bytes(num, size):
        x = num if num >= 0 else 256**size + num
        h = hex(x)[2:].rstrip("L")
        return binascii.unhexlify("0"*((2*size)-len(h))+h)

This will properly handle negative integers and let the user set the number of bytes

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号