The following Python script works well with Python 2.3 and Python 2.4 (which don't have a built-in definition of all()
:
#! /usr/bin/env python
# vim: set fileencoding=utf-8
# (c) Uwe Kleine-König
# GPLv2
import locale
import sys
f = file(sys.argv[1])
data = f.read()
def len_utf8_char(data):
if not 'all' in dir(__builtins__):
def all(seq):
for i in seq:
if not i:
return False
return True
def check_cont(num):
if all(map(lambda c: ord(c) >= 0x80 an开发者_开发问答d ord(c) <= 0xbf, data[1:num])):
return num
else:
return -1
if ord(data[0]) < 128:
# ASCII char
return 1
elif ord(data[0]) & 0xe0 == 0xc0:
return check_cont(2)
elif ord(data[0]) & 0xf0 == 0xe0:
return check_cont(3)
elif ord(data[0]) & 0xf8 == 0xf0:
return check_cont(4)
elif ord(data[0]) & 0xfc == 0xf8:
return check_cont(5)
elif ord(data[0]) & 0xfe == 0xfc:
return check_cont(6)
i = 0
maxl = 0
while i < len(data):
l = len_utf8_char(data[i:])
if l < 0:
prefenc = locale.getpreferredencoding()
if prefenc not in ('UTF-8', 'ANSI_X3.4-1968'):
print prefenc
else:
print 'ISO-8859-1'
sys.exit(0)
if maxl < l:
maxl = l
i += l
if maxl > 1:
print 'UTF-8'
else:
print 'ANSI_X3.4-1968'
Now with Python 2.5 and later this fails as follows:
$ python2.5 guess-charmap guess-charmap
Traceback (most recent call last):
File "guess-charmap", line 43, in <module>
l = len_utf8_char(data[i:])
File "guess-charmap", line 30, in len_utf8_char
return check_cont(2)
File "guess-charmap", line 21, in check_cont
if all(map(lambda c: ord(c) >= 0x80 and ord(c) <= 0xbf, data[1:num])):
NameError: free variable 'all' referenced before assignment in enclosing scope
Removing the compatibility definition of all fixes the problem for Python 2.5+.
I wonder why Python doesn't pick the builtin all()
in this case. Can somebody explain?
When Python parses a function body, it looks for variable names that are used in assignments. All such variables are assumed to be local, unless the global
variable declaration is used.
The def all
assigns a value to the variable name all
. Despite the assignment being inside an if-block
, all
is regarded as a local variable in all cases (whether or not the if-block
is later executed).
When the if-block is not executed, all
becomes an unbound local variable, thus raising a NameError.
If you move the if not 'all' ...
block outside the def len_utf8_char
, then
you will avoid this problem.
For the same reason it happens with variables; the compiler has marked it as a local for the function, and so expects it to be a local. If you want to solve this then just do all = __builtins__.all
in the else
clause.
You can put the definition of all
at module level like this:
try:
all
except NameError:
def all(seq):
for i in seq:
if not i:
return False
return True
Because when you define your function after your all() your still inside the local scope. Why do you have so many function definitions inside a function? Why define all() at all? And why not use a dict for this
if ord(data[0]) < 128:
# ASCII char
return 1
elif ord(data[0]) & 0xe0 == 0xc0:
return check_cont(2)
elif ord(data[0]) & 0xf0 == 0xe0:
return check_cont(3)
elif ord(data[0]) & 0xf8 == 0xf0:
return check_cont(4)
elif ord(data[0]) & 0xfc == 0xf8:
return check_cont(5)
elif ord(data[0]) & 0xfe == 0xfc:
return check_cont(6)
Infact I would call for a rewrite of this code, its complicated and annoying.
精彩评论