I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x.
For the Python 2.x code, near the start of my function, I currently have a line:
if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len))
...
I notice that the s#
format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib
seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clea开发者_JAVA技巧n up the Unicode/byte string mess and not let the two be interchangeable.
So, I'm surprised to find that in Python 3.x, the s
format specifiers for PyArg_ParseTuple()
still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s
format specifiers unusable in practice. Is my analysis correct, or am I missing something?
Looking at the implementation for hashlib
for Python 3.x (e.g. see md5module.c
, function MD5_update()
and its use of GET_BUFFER_VIEW_OR_ERROUT()
macro) I see that it avoids the s
format specifiers, and just takes a generic object (O
specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT()
macro. Is this what we have to do?
I agree with you -- it's one of several spots where the C API migration of Python 3 was clearly not designed as carefully and thouroughly as the Python coder-visible parts. I do also agree that probably the best workaround for now is focusing on "buffer views", per that macro -- until and unless something better gets designed into a future Python C API (don't hold your breath waiting for that to happen, though;-).
精彩评论