开发者

Python prefers unassigned local function over built-in function

开发者 https://www.devze.com 2023-03-18 15:49 出处:网络
The following Python script works well with Python 2.3 and Python 2.4 (which don\'t have a built-in definition of all():

The following Python script works well with Python 2.3 and Python 2.4 (which don't have a built-in definition of all():

#! /usr/bin/env python
# vim: set fileencoding=utf-8
# (c) Uwe Kleine-König
# GPLv2

import locale
import sys

f = file(sys.argv[1])
data = f.read()

def len_utf8_char(data):
    if not 'all' in dir(__builtins__):
        def all(seq):
            for i in seq:
                if not i:
                    return False
            return True

    def check_cont(num):
        if all(map(lambda c: ord(c) >= 0x80 an开发者_开发问答d ord(c) <= 0xbf, data[1:num])):
            return num
        else:
            return -1

    if ord(data[0]) < 128:
        # ASCII char
        return 1
    elif ord(data[0]) & 0xe0 == 0xc0:
        return check_cont(2)
    elif ord(data[0]) & 0xf0 == 0xe0:
        return check_cont(3)
    elif ord(data[0]) & 0xf8 == 0xf0:
        return check_cont(4)
    elif ord(data[0]) & 0xfc == 0xf8:
        return check_cont(5)
    elif ord(data[0]) & 0xfe == 0xfc:
        return check_cont(6)

i = 0
maxl = 0
while i < len(data):
    l = len_utf8_char(data[i:])
    if l < 0:
        prefenc = locale.getpreferredencoding()
        if prefenc not in ('UTF-8', 'ANSI_X3.4-1968'):
            print prefenc
        else:
            print 'ISO-8859-1'
        sys.exit(0)

    if maxl < l:
        maxl = l
    i += l

if maxl > 1:
    print 'UTF-8'
else:
    print 'ANSI_X3.4-1968'

Now with Python 2.5 and later this fails as follows:

$ python2.5 guess-charmap guess-charmap
Traceback (most recent call last):
  File "guess-charmap", line 43, in <module>
    l = len_utf8_char(data[i:])
  File "guess-charmap", line 30, in len_utf8_char
    return check_cont(2)
  File "guess-charmap", line 21, in check_cont
    if all(map(lambda c: ord(c) >= 0x80 and ord(c) <= 0xbf, data[1:num])):
NameError: free variable 'all' referenced before assignment in enclosing scope

Removing the compatibility definition of all fixes the problem for Python 2.5+. I wonder why Python doesn't pick the builtin all() in this case. Can somebody explain?


When Python parses a function body, it looks for variable names that are used in assignments. All such variables are assumed to be local, unless the global variable declaration is used.

The def all assigns a value to the variable name all. Despite the assignment being inside an if-block, all is regarded as a local variable in all cases (whether or not the if-block is later executed).

When the if-block is not executed, all becomes an unbound local variable, thus raising a NameError.

If you move the if not 'all' ... block outside the def len_utf8_char, then you will avoid this problem.


For the same reason it happens with variables; the compiler has marked it as a local for the function, and so expects it to be a local. If you want to solve this then just do all = __builtins__.all in the else clause.


You can put the definition of all at module level like this:

try:
    all
except NameError:
    def all(seq):
        for i in seq:
            if not i:
                return False
        return True


Because when you define your function after your all() your still inside the local scope. Why do you have so many function definitions inside a function? Why define all() at all? And why not use a dict for this

   if ord(data[0]) < 128:
        # ASCII char
        return 1
    elif ord(data[0]) & 0xe0 == 0xc0:
        return check_cont(2)
    elif ord(data[0]) & 0xf0 == 0xe0:
        return check_cont(3)
    elif ord(data[0]) & 0xf8 == 0xf0:
        return check_cont(4)
    elif ord(data[0]) & 0xfc == 0xf8:
        return check_cont(5)
    elif ord(data[0]) & 0xfe == 0xfc:
        return check_cont(6)

Infact I would call for a rewrite of this code, its complicated and annoying.

0

精彩评论

暂无评论...
验证码 换一张
取 消