Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes?_问答_开发者

I have this Python code to do this:

from struct import pack as _pack

def packl(lnum, pad = 1):
    if lnum < 0:
        raise RangeError("Cannot use packl to convert a negative integer "
                         "to a string.")
    count = 0
    l = []
    while lnum > 0:
        l.append(lnum & 0xffffffffffffffffL)
        coun开发者_如何转开发t += 1
        lnum >>= 64
    if count <= 0:
        return '\0' * pad
    elif pad >= 8:
        lens = 8 * count % pad
        pad = ((lens != 0) and (pad - lens)) or 0
        l.append('>' + 'x' * pad + 'Q' * count)
        l.reverse()
        return _pack(*l)
    else:
        l.append('>' + 'Q' * count)
        l.reverse()
        s = _pack(*l).lstrip('\0')
        lens = len(s)
        if (lens % pad) != 0:
            return '\0' * (pad - lens % pad) + s
        else:
            return s

This takes approximately 174 usec to convert 2**9700 - 1 to a string of bytes on my machine. If I'm willing to use the Python 2.7 and Python 3.x specific bit_length method, I can shorten that to 159 usecs by pre-allocating the l array to be the exact right size at the very beginning and using l[something] = syntax instead of l.append.

Is there anything I can do that will make this faster? This will be used to convert large prime numbers used in cryptography as well as some (but not many) smaller numbers.

Edit

This is currently the fastest option in Python < 3.2, it takes about half the time either direction as the accepted answer:

def packl(lnum, padmultiple=1):
    """Packs the lnum (which must be convertable to a long) into a
       byte string 0 padded to a multiple of padmultiple bytes in size. 0
       means no padding whatsoever, so that packing 0 result in an empty
       string.  The resulting byte string is the big-endian two's
       complement representation of the passed in long."""

    if lnum == 0:
        return b'\0' * padmultiple
    elif lnum < 0:
        raise ValueError("Can only convert non-negative numbers.")
    s = hex(lnum)[2:]
    s = s.rstrip('L')
    if len(s) & 1:
        s = '0' + s
    s = binascii.unhexlify(s)
    if (padmultiple != 1) and (padmultiple != 0):
        filled_so_far = len(s) % padmultiple
        if filled_so_far != 0:
            s = b'\0' * (padmultiple - filled_so_far) + s
    return s

def unpackl(bytestr):
    """Treats a byte string as a sequence of base 256 digits
    representing an unsigned integer in big-endian format and converts
    that representation into a Python integer."""

    return int(binascii.hexlify(bytestr), 16) if len(bytestr) > 0 else 0

In Python 3.2 the int class has to_bytes and from_bytes functions that can accomplish this much more quickly that the method given above.

Here is a solution calling the Python/C API via ctypes. Currently, it uses NumPy, but if NumPy is not an option, it could be done purely with ctypes.

import numpy
import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
                               numpy.ctypeslib.ndpointer(numpy.uint8),
                               ctypes.c_size_t,
                               ctypes.c_int,
                               ctypes.c_int]

def packl_ctypes_numpy(lnum):
    a = numpy.zeros(lnum.bit_length()//8 + 1, dtype=numpy.uint8)
    PyLong_AsByteArray(lnum, a, a.size, 0, 1)
    return a

On my machine, this is 15 times faster than your approach.

Edit: Here is the same code using ctypes only and returning a string instead of a NumPy array:

import ctypes
PyLong_AsByteArray = ctypes.pythonapi._PyLong_AsByteArray
PyLong_AsByteArray.argtypes = [ctypes.py_object,
                               ctypes.c_char_p,
                               ctypes.c_size_t,
                               ctypes.c_int,
                               ctypes.c_int]

def packl_ctypes(lnum):
    a = ctypes.create_string_buffer(lnum.bit_length()//8 + 1)
    PyLong_AsByteArray(lnum, a, len(a), 0, 1)
    return a.raw

This is another two times faster, totalling to a speed-up factor of 30 on my machine.

For completeness and for future readers of this question:

Starting in Python 3.2, there are functions int.from_bytes() and int.to_bytes() that perform the conversion between bytes and int objects in a choice of byte orders.

I suppose you really should just be using numpy, which I'm sure has something or other built in for this. It might also be faster to hack around with the array module. But I'll take a stab at it anyway.

IMX, creating a generator and using a list comprehension and/or built-in summation is faster than a loop that appends to a list, because the appending can be done internally. Oh, and 'lstrip' on a large string has got to be costly.

Also, some style points: special cases aren't special enough; and you appear not to have gotten the memo about the new x if y else z construct. :) Although we don't need it anyway. ;)

from struct import pack as _pack


Q_size = 64
Q_bitmask = (1L << Q_size) - 1L


def quads_gen(a_long):
    while a_long:
        yield a_long & Q_bitmask
        a_long >>= Q_size


def pack_long_big_endian(a_long, pad = 1):
    if lnum < 0:
        raise RangeError("Cannot use packl to convert a negative integer "
                         "to a string.")
    qs = list(reversed(quads_gen(a_long)))
    # Pack the first one separately so we can lstrip nicely.
    first = _pack('>Q', qs[0]).lstrip('\x00')
    rest = _pack('>%sQ' % len(qs) - 1, *qs[1:])
    count = len(first) + len(rest)
    # A little math trick that depends on Python's behaviour of modulus
    # for negative numbers - but it's well-defined and documented
    return '\x00' * (-count % pad) + first + rest

Just wanted to post a follow-up to Sven's answer (which works great). The opposite operation - going from arbitrarily long bytes object to Python Integer object requires the following (because there is no PyLong_FromByteArray() C API function that I can find):

import binascii

def unpack_bytes(stringbytes):
    #binascii.hexlify will be obsolete in python3 soon
    #They will add a .tohex() method to bytes class
    #Issue 3532 bugs.python.org
    return int(binascii.hexlify(stringbytes), 16)