开发者

How to write a file of ASCII bytes to a binary file as actual bytes?

开发者 https://www.devze.com 2023-04-10 01:17 出处:网络
Trying to do an MD5 collision homework problem and I\'m not sure how to write raw bytes in Python. I gave it a shot but just ended up with a .bin file with ASCII in it. Here\'s my code:

Trying to do an MD5 collision homework problem and I'm not sure how to write raw bytes in Python. I gave it a shot but just ended up with a .bin file with ASCII in it. Here's my code:

fileWriteObject1 = open("md5One.bin", 'wb')
fileWriteObject2 = open("md5Two.bin", 'wb')
fileReadObject1 = open('bytes1.txt', 'r')
fileReadObject2 = open('bytes2.txt', 'r')

bytes1Contents = fil开发者_如何学运维eReadObject1.readlines()
bytes2Contents = fileReadObject2.readlines()

fileReadObject1.close()
fileReadObject2.close()


for bytes in bytes1Contents:
    toWrite = r"\x" + bytes
    fileWriteObject1.write(toWrite.strip()) 

for bytes in bytes2Contents:
    toWrite = r"\x" + bytes
    fileWriteObject2.write((toWrite.strip())

fileWriteObject1.close()
fileWriteObject2.close()

sample input: d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c 2f ca b5

I had a link to my input file but it seems a mod removed it. It's a file with a hex byte written in ASCII on each line.

EDIT: SOLVED! Thanks to Circumflex.

I had two different text files each with 128 bytes of ASCII. I converted them to binary and wrote them using struck.pack and got a MD5 collision.


If you want to write them as raw bytes, you can use the pack() method of the struct type.

You could write the MD5 out as 2 long long ints, but you'd have to write it in 2 8 byte sections

http://docs.python.org/library/struct.html

Edit:

An example:

import struct

bytes = "6F"
byteAsInt = int(bytes, 16)
packedString = struct.pack('B', byteAsInt)

If I've got this right, you're trying to pull in some text with hex strings written, convert them to binary format and output them? If that is the case, that code should do what you want.

It basically converts the raw hex string to an int, then packs it in binary form (as a byte) into a string.

You could loop over something like this for each byte in the input string


>>> import binascii
>>> binary = binascii.unhexlify("d131dd02c5")
>>> binary
'\xd11\xdd\x02\xc5'

binascii.unhexlify() is defined in binascii.c. Here's a "close to C" implementation in Python:

def binascii_unhexlify(ascii_string_with_hex):
    arglen = len(ascii_string_with_hex) 
    if arglen % 2 != 0:
       raise TypeError("Odd-length string")

    retval = bytearray(arglen//2)
    for j, i in enumerate(xrange(0, arglen, 2)):
        top = to_int(ascii_string_with_hex[i])
        bot = to_int(ascii_string_with_hex[i+1])
        if top == -1 or bot == -1:
           raise TypeError("Non-hexadecimal digit found")
        retval[j] = (top << 4) + bot

    return bytes(retval)

def to_int(c):
    assert len(c) == 1
    return "0123456789abcdef".find(c.lower())

If there were no binascii.unhexlify() or bytearray.fromhex() or str.decode('hex') or similar you could write it as follows:

def unhexlify(s, table={"%02x" % i: chr(i) for i in range(0x100)}):
    if len(s) % 2 != 0: 
        raise TypeError("Odd-length string")
    try:
        return ''.join(table[top+bot] for top, bot in zip(*[iter(s.lower())]*2))
    except KeyError, e:
        raise TypeError("Non-hexadecimal digit found: %s" % e)
0

精彩评论

暂无评论...
验证码 换一张
取 消