Are Haskell List Comprehensions Inefficient?_问答_开发者

I started doing Project Euler and got to problem number 9. Since I was using Project Euler to learn Haskell, I decided to use list comprehensions (as shown in Learn You A Haskell). I do that and GHCI takes awhile to figure out the triplet, which I figured is normal because of the calculations involved. Now, at work yesterday (I don't work as a programmer professionally, yet) I was talking to a friend who knows VBA and he wanted to try to find the answers in VBA. I thought it would be a fun challenge as we开发者_JAVA百科ll, and I churn out some basic for loops and if statements, but what got me was that it was much faster than Haskell was.

My question is: are Haskell's list comprehension incredibly inefficient? At first I thought it was just because I was in GHC's interactive mode, but then I realized VBA is interpreted too.

Please note, I didn't post my code because of it being an answer to project euler. If it will answer my question (as in I'm doing something wrong) then I will gladly post the code.

[edit] Here is my Haskell list comprehension:

[(a,b,c) | c <- [1..1000], b <- [1..c], a <- [1..b], a+b+c=1000, a^2+b^2=c^2]

I guess I could've lowered the range on c but is that what is really slowing it down?

There are two things you could be doing with this problem that could make your code slow. One is how you are trying values for a, b and c. If you loop through all possible values for a, b, c from 1 to 1000, you'll be spending a long time. To give a hint, you can make use of a+b+c=1000 if you rearrange it for c. The other is that if you only use a list comprehension, it will process every possible value for a, b and c. The problem tells you that there is only one unique set of numbers that satisfies the problem, so if you change your answer from this:

[ a * b * c | .... ]

to:

head [ a * b * c | ... ]

then Haskell's lazy evaluation means that it will stop after finding the first answer. This is the Haskell equivalent of breaking out of your VBA loop when you find the first answer. When I used both these tips, I had an answer that completed very quickly (under a second) in ghci.

Addendum: I missed at first the condition a < b < c. You can also make use of this in your list comprehensions; it is valid to say things along the lines of:

[(a, b) | b <- [1..100], a <- [1..b-1]]

Consider this simplified version of your list comprehension:

[(a,b,c) | a <- [1..1000], b <- [1..1000], c <- [1..1000]]

This will give all possible combinations of a, b, and c. It's kind of like saying, "how many ways can three one-thousand-sided dice land?" The answer is 1000*1000*1000 = 1,000,000,000 different combinations. If it took 0.001 seconds to generate each combination, it would therefore take 1,000,000 seconds (~11.5 days) to finish all combinations. (OK, 0.001 seconds is actually pretty slow for a computer, but you get the idea)

When you add predicates to your list comprehension, it still takes the same amount of time to compute; in fact, it takes longer since it needs to check the predicate for each of the 1 billion combinations it computes.

Now consider your comprehension. It looks like it should be much faster, right?

[(a,b,c) | c <- [1..1000], b <- [1..c], a <- [1..b], a+b+c=1000, a^2+b^2=c^2]

There are 1000 choices for c. How many are there for b and a? Well, the average choice for c is 500. For all choices of c, then, there are an average of 500 choices for b (since b can range from 1 to c). Likewise, for all choices of c and b, there are an average of 250 choices for a. That's very hand-wavy, but I'm fairly sure it's accurate. So 1000 choices for c * 1000/2 choices for b * 1000/4 choices for a = 1 billion / 8 ~= 100 million. It's 8x faster, but if you paid attention, you'll notice it's actually the same big-Oh complexity as the simplified version above. If we compared "simplified" vs "improved" versions of the same problem, but from [1..100000] instead of [1..1000], the "improved" would still only be 8x faster than the "simplified".

Don't get me wrong, 8x is a wonderful constant-factor speedup. But unless you want to wait a couple hours to get the solution, you'll need to get a better big-Oh.

As Neil noted, the way to reduce the complexity of this problem is, for a given b and c, choose the a that satisfies a+b+c=1000. That way, you're not trying a bunch of as that will fail. This will drop the big-Oh complexity; you'll only be considering approximately 1000 * 500 * 1 = 500,000 combinations, instead of ~100,000,000.

Once you get the solution to the problem you can check out other peoples versions of Haskell solutions on the Project Euler site to get an idea of how other people have solved the problem. Incidentally, here is a link to the referenced problem: http://projecteuler.net/index.php?section=problems&id=9

In addition to what everyone else has said about generating fewer elements in the generators, you can also switch to using Int instead of Integer as the type of the numbers. The default is Integer, but your numbers are small enough to fit in an Int.

(Also, to nitpick, Haskell list comprehensions have no speed. Haskell is a language definition with very little operational semantics. A particular Haskell implementation might have slow list comprehensions, though.)