Can someone clarify what this Joel On Software quote means: (functional programs have no side effects)_问答_开发者

I was reading Joel On Software today and ran across this quote:

Without understanding functional programming, you can't invent MapReduce, the algorithm that makes Google so massively scalable. The terms Map and Reduce come from Lisp and functional programming. MapReduce is, in retrospect, obvious to anyone who remembers from their 6.001-equivalent programming class that purely functio开发者_开发百科nal programs have no side effects and are thus trivially parallelizable.

What does he mean when he says functional programs have no side effects? And how does this make parallelizing trivial?

What does he mean when he says functional programs have no side effects?

Most people think of programming as creating variables, assigning them values, adding things to lists, etc. Variables "vary", hence the name.

Functional programming is a style of designing programs to eliminate variables -- everything is a constant or readonly.

When Joel says functional programs have no side-effects, there's a lot of hand-waving involved since its perfectly easy to write functional programs which do modify variables -- but largely, when people talk about functional programming, they mean programs which don't hold any modifiable state.

"But Juliet! How can write a useful program if it can't modify anything"

Good question!

You "modify" things by creating a new instance of your object with modified state. For example:

class Customer
{
    public string Id { get; private set; }
    public string Name { get; private set; }

    public Customer(string id, string name)
    {
        this.Id = id;
        this.Name = name;
    }

    public Customer SetName(string name)
    {
        // returns a new customer with the given name
        return new Customer(this.Id, name);
    }
}

So all the initialization take place in the constructor, and we can't modify the object ever again -- we create new instances with our modifications passed into the constructor.

You'll be surprised how far you can carry this style of programming.

"But Juliet!? How can this possibly be efficient with all this copying?"

The trick is realizing that you don't have to copy your entire object graph, only the parts which have changed. If parts of your object graph haven't changed, can reuse it in your new object (copy the pointer, don't new up a new instance of any objects in that part of the graph).

You'll be surprised how far you can carry this style of programming. In fact, its extremely easy to write immutable versions of many common data structures -- like immutable Avl Trees, red-black trees, many kinds of heaps, etc. See here for an implementation of an immutable treap.

In most cases, the immutable version of a data structure has the same computational complexity for insert/lookup/delete as its mutable counterparts. The only difference is that inserting returns a new version of your data structure without modifying the original one.

And how does this make parallelizing trivial?

Think about it: if you have an immutable tree or any other data structure, then you can two threads inserting, removing, and lookup up items in the tree without needing to take a lock. Since the tree is immutable, its not possible for one thread to put the object in an invalid state under another thread's nose -- so we eliminate a whole class of multithreading errors related to race conditions. Since we don't have race-conditions, we don't have any need for locks, so we also eliminate a whole class of errors related to deadlocking.

Because immutable objects are intrinsically thread-safe, they're said to make concurrency "trivial". But that's only really half the story. There are times when we need changes in one thread to be visible to another - so how do we do that with immutable objects?

The trick is to re-think our concurrency model. Instead of having two threads sharing state with one another, we think of threads as being a kind of mailbox which can send and receive messages.

So if thread A has a pointer to thread B, it can pass a message -- the updated data structure -- to thread B, where thread B merges its copy with the data structure with the copy in the message it received. Its also possible for a thread to pass itself as a message, so that Thread A sends itself to Thread B, then thread B sends a message back to Thread A via the pointer it received.

Believe me, the strategy above makes concurrent programming 1000x easier than locks on mutable state. So the important part of Joel's comment: "Without understanding functional programming, you can't invent MapReduce, the algorithm that makes Google so massively scalable."

Traditional locking doesn't scale well because, in order to lock an object, you need to have a reference to its pointer -- the locked object needs to be in the same memory as the object doing the locking. You can't obtain a lock on an object across processes.

But think about the message passing model above: threads are passing messages two and from one another. Is there really a difference between passing a message to a thread in the same process vs passing a message to thread listening on some IP address? Not really. And its exactly because threads can send and receive messages across the process boundary that message passing scales as well as it does, because its not bound to a single machine, you can have your app running across as many machines as needed.

(For what its worth, you can implement message passing using mutable messages, its just that no one ever wants to because a thread can't do anything to the message without locking it -- which we already know is full of problems. So immutable is the default way to go when you're using message passing concurrency.)

Although its very high level and glosses over a lot of actual implementation detail, the principles above are exactly how Google's MapReduce can scale pretty much indefinitely.

See also: http://www.defmacro.org/ramblings/fp.html

Let me wikipedia it for you

In brief, a pure function is one that calculate things based only on its given arguments and returns a result.

Writing something to the screen or changing a global variable (or a data member) is a side effect. Relying on data other than that given in an argument also makes your function non-pure although it is not a side effect.

Writing a "pure function" makes it easier to invoke many instances of it in parallel. That's mainly because being pure, you can be sure it doesn't effect the outside world and doesn't rely on outside information.

Functional programming aims to create functions that are dependent only on their inputs, and do not change state elsewhere in the system (ie, do not have side-effects to their execution).

This means, among other things, that they are idempotent: the same function can be run many times over the same input, and since it has no side-effects you don't care how many times it's run. This is good for parallelization, because it means that you don't have to create a lot of overhead to keep track of whether a particular node crashes.

Of course, in the real world, it's hard to keep side-effects out of your programs (ie, writing to a file). So real-world programs tend to be a combination of functional and non-functional portions.

Units of functional programs have only their input and their output, no internal state. This lack of internal state means that you can put the functional modules on any number of cores/nodes, without having to worry about having the previous calculation in the module affecting the next.

I believe what he means is that purely functional code makes explicit the flow of data through the program. Side-effects allow portions of the code to "communicate" in ways that are difficult to analyze.

Without side-effects in play, the runtime environment can determine how to best decompose the code into parallelism according to the structure of the functional code.

This would be a simplification of the reality, because there is also an issue of decomposing the code into "chunks" which amount to approximately equal "effort." This requires a human to write the functional code in such a way that it will decompose reasonably when parallelized.