开发者

Dissecting a line of (obfuscated?) Python

开发者 https://www.devze.com 2023-01-12 18:24 出处:网络
I was reading another question on Stack Overflow (Zen of Python), and I came across this line in Jaime Soriano\'s answer:

I was reading another question on Stack Overflow (Zen of Python), and I came across this line in Jaime Soriano's answer:

import this
"".join([c in this.d and this.d[c] or c for c in this.s])

Entering the above in a Python shell prints:

"The Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is
better than implicit.\nSimple is better than complex.\nComplex is better than 
complicated.\nFlat is better than nested.\nSparse is better than dense.
\nReadability counts.\nSpecial cases aren't special enough to break the rules.
\nAlthough practicality beats purity.\nErrors should never pass silently.
\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to
guess.\nThere should be one-- and preferably only one --obvious way to do it.
\nAlthough that way may not be obvious a开发者_开发技巧t first unless you're Dutch.\nNow is 
better than never.\nAlthough never is often better than *right* now.\nIf the 
implementation is hard to explain, it's a bad idea.\nIf the implementation is
easy to explain, it may be a good idea.\nNamespaces are one honking great idea 
-- let's do more of those!"

And so of course I was compelled to spend my entire morning trying to understand the above list... comprehension... thing. I hesitate to flatly declare it obfuscated, but only because I've been programming for just a month and a half and so am unsure as to whether or not such constructions are commonplace in python.

this.s contains an encoded version of the above printout:

"Gur Mra bs Clguba, ol Gvz Crgref\n\nOrnhgvshy vf orggre guna htyl.\nRkcyvpvg vf orggre guna vzcyvpvg.\nFvzcyr vf orggre guna pbzcyrk.\nPbzcyrk vf orggre guna pbzcyvpngrq.\nSyng vf orggre guna arfgrq.\nFcnefr vf orggre guna qrafr.\nErnqnovyvgl pbhagf.\nFcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.\nNygubhtu cenpgvpnyvgl orngf chevgl.\nReebef fubhyq arire cnff fvyragyl.\nHayrff rkcyvpvgyl fvyraprq.\nVa gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.\nGurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.\nNygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.\nAbj vf orggre guna arire.\nNygubhtu arire vf bsgra orggre guna *evtug* abj.\nVs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.\nVs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.\nAnzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"

And this.d contains a dictionary with the cypher that decodes this.s:

{'A': 'N', 'C': 'P', 'B': 'O', 'E': 'R', 'D': 'Q', 'G': 'T', 'F': 'S', 'I': 'V', 'H': 'U', 'K': 'X', 'J': 'W', 'M': 'Z', 'L': 'Y', 'O': 'B', 'N': 'A', 'Q': 'D', 'P': 'C', 'S': 'F', 'R': 'E', 'U': 'H', 'T': 'G', 'W': 'J', 'V': 'I', 'Y': 'L', 'X': 'K', 'Z': 'M', 'a': 'n', 'c': 'p', 'b': 'o', 'e': 'r', 'd': 'q', 'g': 't', 'f': 's', 'i': 'v', 'h': 'u', 'k': 'x', 'j': 'w', 'm': 'z', 'l': 'y', 'o': 'b', 'n': 'a', 'q': 'd', 'p': 'c', 's': 'f', 'r': 'e', 'u': 'h', 't': 'g', 'w': 'j', 'v': 'i', 'y': 'l', 'x': 'k', 'z': 'm'}

As far as I can tell, the flow of execution in Jaime's code is like this:

1. the loop c for c in this.s assigns a value to c

2. if the statement c in this.d evaluates to True, the "and" statement executes whatever happens to be to its immediate right, in this case this.d[c].

3. if the statement c in this.d evaluates to False (which never happens in Jaime's code), the "or" statement executes whatever happens to be to its immediate right, in this case the loop c for c in this.s.

Am I correct about that flow?

Even if I am correct about the order of execution, this still leaves me with a ton of questions. Why is <1> the first thing to execute, even though the code for it comes last on the line after several conditional statements? In other words, why does the for loop begin to execute and assign value, but then only actually return a value at a later point in the code execution, if at all?

Also, for bonus points, what's with the weird line in the Zen file about the Dutch?

Edit: Though it shames me to say it now, until three seconds ago I assumed Guido van Rossum was Italian. After reading his Wikipedia article, I at least grasp, if not fully understand, why that line is in there.


The operators in the list comprehension line associate like this:

"".join([(((c in this.d) and this.d[c]) or c) for c in this.s])

Removing the list comprehension:

result = []
for c in this.s:
   result.append(((c in this.d) and this.d[c]) or c)
print "".join(result)

Removing the and/or boolean trickery, which is used to emulate a if-else statement:

result = []
for c in this.s:
   if c in this.d:
      result.append(this.d[c])
   else:
      result.append(c)
print "".join(result)


You are correct about the flow.

The loop is of kind [dosomething(c) for c in this.s] It's a list comprehension and should be read as dosomething for all c in this.s.

The Dutch part is about Guido Van Rossum creator of python is Dutch.


Your analysis is close. It is a list comprehension. (btw, the same output would result if the outer square brackets were eliminated, which would be called a generator comprehension)

There is some documentation here.

The basic form of a list comprehension is

[expression for var in enumerable if condition]

They are evaluated in this order:

  1. enumerable is evaluated
  2. Each value in turn is assigned to var
  3. condition is checked
  4. expression is evaluated

The result is the list of expression values for each element in the enumerable for which the condition was true.

This example doesn't use a condtion, so what is left, after adding some parentheses is:

[(c in this.d and this.d[c] or c) for c in (this.s)]

this.s is the enumerable. c is the iterating variable. c in this.d and this.d[c] or c is the expression.

c in this.d and this.d[c] or c uses the short-circuiting nature of python's logical operators to achieve the same thing as this.d[c] if c in this.d else c.

All in all, I would not call this obfuscated at all. Once you understand the power of list comprehensions, it will look quite natural.


Generally, list comprehensions are of the following form:

[ expression for var in iterator ]

When I write down a list comprehension, I often start by writing

[ for var in iterator ]

because many years of procedural programming has inculcated the for-loop aspect into my mind as the part that comes first.

And, as you've rightly noted, the for-loop is the part that seems to "execute" first.

For each pass through the loop, the expression is evaluated. (A minor point: expressions are evaluated, statements are executed.)

So in this case, we have

[ expression for c in this.s ]

this.s is a string. In Python, strings are iterators! When you write

for c in some_string:

the loop iterates over the characters in the string. So c takes on each of the characters in this.s in order.

Now the expression is

c in this.d and this.d[c] or c

This is what's known as a ternary operation. That link explains the logic, but the basic idea is

if c in this.d:
    the expression evaluates to this.d[c]
else:
    the expression evaluates c

The condition c in this.d is thus simply to check that the dict this.d has a key with value c. If it does, return this.d[c], and if it doesn't, return c itself.

Another way to write it would be

[this.d.get(c,c) for c in this.s]

(the second argument to the get method is the default value returned when the first argument is not in the dict).

PS. The ternary form

condition and value1 or value2

is error prone. (Consider what happens if condition is True, but value1 is None. Since condition is True, you might expect the ternary form to evaluate to value1, that is, None. But since None has boolean value False, the ternary form evaluates to value2 instead. Thus, if you are not careful and aware of this pitfall, the ternary form can introduce errors.)

For modern versions of Python a better way to write this would be

value1 if condition else value2

It is not susceptible to the pitfall mentioned above. If condition is True, the expression always evaluates to value1.

But in the particular case above, I'd prefer this.d.get(c,c).


"".join([c in this.d and this.d[c] or c for c in this.s]) is certainly obfuscated. Here's a Zen version:

this.s.decode('rot13')


My version with modern if else and generator:

import this ## prints zenofpython
print '-'*70
whatiszenofpython = "".join(this.d[c] if c in this.d else c for c in this.s)
zen = ''
for c in this.s:
    zen += this.d[c] if c in this.d else c
print zen

Verbal version: import this, the main program of it descrambles and print the message this.s To descramble the message replace those letters which are found in dict this.d with their decoded counter parts (upper/lowercase different). The other letters do not need to change but print as they are.

0

精彩评论

暂无评论...
验证码 换一张
取 消