How to replace for-loops with a functional statement in C#?_问答_开发者

A colleague once said that God is killing a kitten every time I write a for-loop.

When asked how to avoid 开发者_JS百科for-loops, his answer was to use a functional language. However, if you are stuck with a non-functional language, say C#, what techniques are there to avoid for-loops or to get rid of them by refactoring? With lambda expressions and LINQ perhaps? If so, how?

Questions

So the question boils down to:

Why are for-loops bad? Or, in what context are for-loops to avoid and why?
Can you provide C# code examples of how it looks before, i.e. with a loop, and afterwards without a loop?

Functional constructs often express your intent more clearly than for-loops in cases where you operate on some data set and want to transform, filter or aggregate the elements.

Loops are very appropriate when you want to repeatedly execute some action.

For example

int x = array.Sum();

much more clearly expresses your intent than

int x = 0;
for (int i = 0; i < array.Length; i++)
{
    x += array[i];
}

Why are for-loops bad? Or, in what context are for-loops to avoid and why?

If your colleague has a functional programming, then he's probably already familiar with the basic reasons for avoiding for loops:

Fold / Map / Filter cover most use cases of list traversal, and lend themselves well to function composition. For-loops aren't a good pattern because they aren't composable.

Most of the time, you traverse through a list to fold (aggregate), map, or filter values in a list. These higher order functions already exist in every mainstream functional language, so you rarely see the for-loop idiom used in functional code.

Higher order functions are the bread and butter of function composition, meaning you can easily combine simple function into something more complex.

To give a non-trivial example, consider the following in an imperative language:

let x = someList;
y = []
for x' in x
    y.Add(f x')

z = []
for y' in y
    z.Add(g y')

In a functional language, we'd write map g (map f x), or we can eliminate the intermediate list using map (f . g) x. Now we can, in principle, eliminate the intermediate list from the imperative version, and that would help a little -- but not much.

The main problem with the imperative version is simply that the for-loops are implementation details. If you want change the function, you change its implementation -- and you end up modifying a lot of code.

Case in point, how would you write map g (filter f x) in imperatively? Well, since you can't reuse your original code which maps and maps, you need to write a new function which filters and maps instead. And if you have 50 ways to map and 50 ways to filter, how you need 50^50 functions, or you need to simulate the ability to pass functions as first-class parameters using the command pattern (if you've ever tried functional programming in Java, you understand what a nightmare this can be).

Back in the the functional universe, you can generalize map g (map f x) in way that lets you swap out the map with filter or fold as needed:

let apply2 a g b f x = a g (b f x)

And call it using apply2 map g filter f or apply2 map g map f or apply2 filter g filter f or whatever you need. Now you'd probably never write code like that in the real world, you'd probably simplify it using:

let mapmap g f = apply2 map g map f
let mapfilter g f = apply2 map g filter f

Higher-order functions and function composition give you a level of abstraction that you cannot get with the imperative code.

Abstracting out the implementation details of loops let's you seamlessly swap one loop for another.

Remember, for-loops are an implementation detail. If you need to change the implementation, you need to change every for-loop.

Map / fold / filter abstract away the loop. So if you want to change the implementation of your loops, you change it in those functions.

Now you might wonder why you'd want to abstract away a loop. Consider the task of mapping items from one type to another: usually, items are mapped one at a time, sequentially, and independently from all other items. Most of the time, maps like this are prime candidates for parallelization.

Unfortunately, the implementation details for sequential maps and parallel maps aren't interchangeable. If you have a ton of sequential maps all over your code, and you want swap them out for parallel maps, you have two choices: copy/paste the same parallel mapping code all over your code base, or abstract away mapping logic into two functions map and pmap. Once you're go the second route, you're already knee-deep in functional programming territory.

If you understand the purpose of function composition and abstracting away implementation details (even details as trivial as looping), you can start to appreciate just how and why functional programming is so powerful in the first place.

For loops are not bad. There are many very valid reasons to keep a for loop.
You can often "avoid" a for loop by reworking it using LINQ in C#, which provides a more declarative syntax. This can be good or bad depending on the situation:

Compare the following:

var collection = GetMyCollection();
for(int i=0;i<collection.Count;++i)
{
     if(collection[i].MyValue == someValue)
          return collection[i];
}

vs foreach:

var collection = GetMyCollection();
foreach(var item in collection)
{
     if(item.MyValue == someValue)
          return item;
}

vs. LINQ:

var collection = GetMyCollection();
return collection.FirstOrDefault(item => item.MyValue == someValue);

Personally, all three options have their place, and I use them all. It's a matter of using the most appropriate option for your scenario.

There's nothing wrong with for loops but here are some of the reasons people might prefer functional/declarative approaches like LINQ where you declare what you want rather than how you get it:-

Functional approaches are potentially easier to parallelize either manually using PLINQ or by the compiler. As CPUs move to even more cores this may become more important.
Functional approaches make it easier to achieve lazy evaluation in multi-step processes because you can pass the intermediate results to the next step as a simple variable which hasn't been evaluated fully yet rather than evaluating the first step entirely and then passing a collection to the next step (or without using a separate method and a yield statement to achieve the same procedurally).
Functional approaches are often shorter and easier to read.
Functional approaches often eliminate complex conditional bodies within for loops (e.g. if statements and 'continue' statements) because you can break the for loop down into logical steps - selecting all the elements that match, doing an operation on them, ...

For loops don't kill people (or kittens, or puppies, or tribbles). People kill people. For loops, in and of themselves, are not bad. However, like anything else, it's how you use them that can be bad.

Sometime you don't kill just one kitten.

for (int i = 0; i < kittens.Length; i++) { kittens[i].Kill(); }

Sometimes you kill them all.

You can refactor your code well enough so that you won't see them often. A good function name is definitely more readable that a for loop.

Taking the example from AndyC :

Loop

// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
    if(s.Length > 5)
    {
        myList.add(s);
    }
}

Linq

// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5)
                               .ToList<string();

Wheter you use the first or the second version inside your function, It's easier to read

var filteredList = myList.GetStringLongerThan(5);

Now that's an overly simple example, but you get my point.

Your colleague is not right. For loops are not bad per se. They are clean, readable and not particularly error prone.

Your colleague is wrong about for loops being bad in all cases, but correct that they can be rewritten functionally.

Say you have an extension method that looks like this:

void ForEach<T>(this IEnumerable<T> collection, Action <T> action)
{
    foreach(T item in collection)
    {
        action(item)
    }
}

Then you can write a loop like this:

mycollection.ForEach(x => x.DoStuff());

This may not be very useful now. But if you then replace your implementation of the ForEach extension method for use a multi threaded approach then you gain the advantages of parallelism.

This obviously isn't always going to work, this implementation only works if the loop iterations are completely independent of each other, but it can be useful.

Also: always be wary of people who say some programming construct is always wrong.

A simple (and pointless really) example:

Loop

// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
    if(s.Length > 5)
    {
        myList.add(s);
    }
}

Linq

// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5).ToList<string>();

In my book, the second one looks a lot tidier and simpler, though there's nothing wrong with the first one.

Sometimes a for-loop is bad if there exists a more efficient alternative. Such as searching, where it might be more efficient to sort a list and then use quicksort or binary sort. Or when you are iterating over items in a database. It is usually much more efficient to use set-based operations in a database instead of iterating over the items.

Otherwise if the for-loop, especially a for-each makes the most sense and is readable, then I would go with that rather than rafactor it into something that isn't as intuitive. I personally don't believe in these religious sounding "always do it this way, because that is the only way". Rather it is better to have guidelines, and understand in what scenarios it is appropriate to apply those guidelines. It is good that you ask the Why's!

For loop is, let's say, "bad" as it implies branch prediction in CPU, and possibly performance decrease when branch prediction miss.

But CPU (having a branch prediction accuracy of 97%) and compiler with tecniques like loop unrolling, make loop performance reduction negligible.

If you abstract the for loop directly you get:

void For<T>(T initial, Func<T,bool> whilePredicate, Func<T,T> step, Action<T> action)
{
    for (T t = initial; whilePredicate(t); step(t))
    {
        action(t);
    }
}

The problem I have with this from a functional programming perspective is the void return type. It essentially means that for loops do not compose nicely with anything. So the goal is not to have a 1-1 conversion from for loop to some function, it is to think functionally and avoid doing things that do not compose. Instead of thinking of looping and acting think of the whole problem and what you are mapping from and to.

A for loop can always be replaced by a recursive function that doesn't involve the use of a loop. A recursive function is a more functional stye of programming.

But if you blindly replace for loops with recursive functions, then kittens and puppies will both die by the millions, and you will be done in by a velocirapter.

OK, here's an example. But please keep in mind that I do not advocate making this change!

The for loop

for (int index = 0; index < args.Length; ++index)
    Console.WriteLine(args[index]);

Can be changed to this recursive function call

WriteValuesToTheConsole(args, 0);


static void WriteValuesToTheConsole<T>(T[] values, int startingIndex)
{
    if (startingIndex < values.Length)
    {
        Console.WriteLine(values[startingIndex]);
        WriteValuesToTheConsole<T>(values, startingIndex + 1);
    }
}

This should work just the same for most values, but it is far less clear, less effecient, and could exhaust the stack if the array is too large.

Your colleague may be suggesting under certain circumstances where database data is involved that it is better to use an aggregate SQL function such as Average() or Sum() at query time as opposed to processing the data on the C# side within an ADO .NET application.

Otherwise for loops are highly effective when used properly, but realize that if you find yourself nesting them to three or more orders, you might need a better algorithm, such as one that involves recursion, subroutines or both. For example, a bubble sort has a O(n^2) runtime on its worst-case (reverse order) scenario, but a recursive sort algorithm is only O(n log n), which is much better.

Hopefully this helps.

Any construct in any language is there for a reason. It's a tool to be used to accomplish a task. Means to an end. In every case, there are manners in which to use it appropriately, that is, in a clear and concise way and within the spirit of the language AND manners to abuse it. This applies to the much-misaligned goto statement as well as to your for loop conundrum, as well as while, do-while, switch/case, if-then-else, etc. If the for loop is the right tool for what you're doing, USE IT and your colleague will need to come to terms with your design decision.

It depends upon what is in the loop but he/she may be referring to a recursive function

    //this is the recursive function
    public static void getDirsFiles(DirectoryInfo d)
    {
        //create an array of files using FileInfo object
        FileInfo [] files;
        //get all files for the current directory
        files = d.GetFiles("*.*");

        //iterate through the directory and print the files
        foreach (FileInfo file in files)
        {
            //get details of each file using file object
            String fileName = file.FullName;
            String fileSize = file.Length.ToString();
            String fileExtension =file.Extension;
            String fileCreated = file.LastWriteTime.ToString();

            io.WriteLine(fileName + " " + fileSize + 
               " " + fileExtension + " " + fileCreated);
        }

        //get sub-folders for the current directory
        DirectoryInfo [] dirs = d.GetDirectories("*.*");

        //This is the code that calls 
        //the getDirsFiles (calls itself recursively)
        //This is also the stopping point 
        //(End Condition) for this recursion function 
        //as it loops through until 
        //reaches the child folder and then stops.
        foreach (DirectoryInfo dir in dirs)
        {
            io.WriteLine("--------->> {0} ", dir.Name);
            getDirsFiles(dir);
        }

    }

The question is if the loop will be mutating state or causing side effects. If so, use a foreach loop. If not, consider using LINQ or other functional constructs.

See "foreach" vs "ForEach" on Eric Lippert's Blog.