My present task is to dissect tcpdump data that includes P2P messages and I am having trouble with the piece data I acquire and write to a file on my x86 machine. My suspicion is I have a simple endian-ness issue with the bytes I write to to file.
I have a list of bytes holding a piece of P2P video read and processed using python-pcapy package BTW.
bytes = [14, 254, 23, 35, 34, 67, etc... ]
I am looking for a way to store these bytes, presently held in a list in my Python application to a file.
Currently I write the pieces as follows:
def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts):
file = open(filename,"ab")
# Iterate through bytes writing them to a file if don't have piece already
if not self.piecemap[ipdst].has_key(pieceindex):
for byte in bytes:
file.write('%c' % byte)
file.flush()
self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))
# Remember we have this piece now in case duplicates arrive
self.piecemap[ipdst][pieceindex] = True
# TODO: Collect stats
file.close()
As you can see from the for loop, I write the bytes to the file in the same order as I process them from the wire (i.e. network or big-endian order).
Suffice to say, the video which is the payload of the pieces does not playback well in VLC :-D
I think I need to convert them to little-endian byte order but am not sure the best way to approach th开发者_运维百科is in Python.
UPDATE
The solution that worked out for me (writing P2P pieces handling endian issues appropriately) was:
def writePiece(self, filename, pieceindex, bytes, ipsrc, ipdst, ts):
file = open(filename,"r+b")
if not self.piecemap[ipdst].has_key(pieceindex):
little = struct.pack('<'+'B'*len(bytes), *bytes)
# Seek to offset based on piece index
file.seek(pieceindex * self.piecesize)
file.write(little)
file.flush()
self.procLog.info("Wrote (%d) bytes of piece (%d) to %s" % (len(bytes), pieceindex, filename))
# Remember we have this piece now in case duplicates arrive
self.piecemap[ipdst][pieceindex] = True
file.close()
The key to the solution was usage of Python struct module as suspected and in particular:
little = struct.pack('<'+'B'*len(bytes), *bytes)
Thanks to those who responded with helpful suggestions.
To save yourself some work you might like to use a bytearray
(Python 2.6 and later):
b = [14, 254, 23, 35]
f = open("file", 'ab')
f.write(bytearray(b))
This does all the converting of your 0-255 values into bytes without the need for all the looping.
I can't see what your problem is otherwise without more information. If the data really is byte-wise then endianness isn't an issue, as others have said.
(By the way, using bytes
and file
as variable names isn't good as it hide the built-ins of the same name).
You can also use an array.array:
from array import array
f.write(array('B', bytes))
instead of
f.write(struct.pack('<'+'B'*len(bytes), *bytes))
which when tidied up a little is
f.write(struct.pack('B' * len(bytes), *bytes))
# the < is redundant; there is NO ENDIANNESS ISSUE
which if len(bytes) is "large" might be better as
f.write(struct.pack('%dB' % len(bytes), *bytes))
This may have been answered previously in Python File Slurp w/ endian conversion.
精彩评论