开发者

Other ways to deal with "loop initialization" in C#

开发者 https://www.devze.com 2023-01-13 08:02 出处:网络
To start with I\'ll say that I agree that goto statements are largely made irrelevant by higher level constructs in modern programming languages and shouldn\'t be used when a suitable substitute is av

To start with I'll say that I agree that goto statements are largely made irrelevant by higher level constructs in modern programming languages and shouldn't be used when a suitable substitute is available.

I was re-reading an original edition of Steve McConnell's Code Complete recently and had forgotten about his suggestion for a common coding problem. I had read it years ago when I was first getting started and don't think I realized how useful the recipe would be. The coding problem is the following: when executing a loop you often need to execute part of the loop to initialize state and then execute the loop with some other logic and ending each loop with the same initialization logic. A concrete example is implementing String.Join(delimiter, array) method.

I think everybody's first take on the problem is this. Assume the append method is defined to add the argument to your return value.

bool isFirst = true;
foreach (var element in array)
{
  if (!isFirst)
  {
     append(delimiter);
  }
  else
  {
    isFirst = false;
  }

  append(element);
}

Note: A slight optimization to this is to remove the else and put it at the end of the loop. An assignment usually being a single instruction and equivalent to an else and decreases the number of basic blocks by 1 and increases the basic block size of the main part. The result being that execute a condition in each loop to determine if you should add the delimiter or not.

I've also seen and used other takes on dealing with this common loop problem. You can execute the initial element code first outside the loop, then perform your loop from the second element to the end. You can also change the logic to always append the element then the delimiter and once the loop is completed you can simply remove the last delimiter you added.

The latter solution tends to be the one that I prefer only because it doesn't duplicate any code. If the logic of the initialization sequence ever changes, you don't have to remember to fix it in two places. It does however require extra "work" to do something and then undo it, causing at least extra cpu cycles and in many cases such as our String.Join example requires extra memory as well.

I was excited then to read this construct

var enumerator = array.GetEnumerator();
if (enumerator.MoveNext())
{
  goto start;
  do {
    append(delimiter);

  start:
    append(enumerator.Current);
  } while (enumerator.MoveNext());
}

The benefit here being that you get no duplicated code and 开发者_开发技巧you get no additional work. You start your loop half way into the execution of your first loop and that is your initialization. You are limited to simulating other loops with the do while construct but the translation is easy and reading it is not difficult.

So, now the question. I happily went to try adding this to some code I was working on and found it didn't work. Works great in C, C++, Basic but it turns out in C# you can't jump to a label inside a different lexical scope that is not a parent scope. I was very disappointed. So I was left wondering, what is the best way to deal with this very common coding problem (I see it mostly in string generation) in C#?

To perhaps be more specific with requirements:

  • Don't duplicate code
  • Don't do unnecessary work
  • Don't be more than 2 or 3 times slower than other code
  • Be readable

I think readability is the only thing that might arguably suffer with the recipe I stated. However it doesn't work in C# so what's the next best thing?

* Edit * I changed my performance criteria because of some of the discussion. Performance is generally not a limiting factor here, so the goal more correctly should be to not be unreasonable, not to be the fastest ever.

The reason I dislike the alternate implementations I suggest is because they either duplicate code which leaves room for changing one part and not the other or for the one I generally choose it requires "undoing" the operation which requires extra thought and time to undo the thing that you just did. With string manipulation in particular this usually leaves you open for off by one errors or failing to account for an empty array and trying to undo something that didn't happen.


Personally I like Mark Byer's option, but you could always write your own generic method for this:

public static void IterateWithSpecialFirst<T>(this IEnumerable<T> source,
    Action<T> firstAction,
    Action<T> subsequentActions)
{
    using (IEnumerator<T> iterator = source.GetEnumerator())
    {
        if (iterator.MoveNext())
        {
            firstAction(iterator.Current);
        }
        while (iterator.MoveNext())
        {
            subsequentActions(iterator.Current);
        }
    }
}

That's relatively straightforward... giving a special last action is slightly harder:

public static void IterateWithSpecialLast<T>(this IEnumerable<T> source,
    Action<T> allButLastAction,
    Action<T> lastAction)
{
    using (IEnumerator<T> iterator = source.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            return;
        }            
        T previous = iterator.Current;
        while (iterator.MoveNext())
        {
            allButLastAction(previous);
            previous = iterator.Current;
        }
        lastAction(previous);
    }
}

EDIT: As your comment was concerned with the performance of this, I'll reiterate my comment in this answer: while this general problem is reasonably common, it's not common for it to be such a performance bottleneck that it's worth micro-optimizing around. Indeed, I can't remember ever coming across a situation where the looping machinery became a bottleneck. I'm sure it happens, but that isn't "common". If I ever run into it, I'll special-case that particular code, and the best solution will depend on exactly what the code needs to do.

In general, however, I value readability and reusability much more than micro-optimization.


For your specific example there is a standard solution: string.Join. This handles adding the delimiter correctly so that you don't have to write the loop yourself.

If you really want to write this yourself an approach you can use is the following:

string delimiter = "";
foreach (var element in array)
{
    append(delimiter);
    append(element);
    delimiter = ",";
}

This should be reasonably efficient and I think it is reasonable to read. The constant string "," is interned so this won't result in a new string being created on each iteration. Of course if performance is critical for your application you should benchmark rather than guess.


You are already willing to give up on foreach. So this ought to be suitable:

        using (var enumerator = array.GetEnumerator()) {
            if (enumerator.MoveNext()) {
                for (;;) {
                    append(enumerator.Current);
                    if (!enumerator.MoveNext()) break;
                    append(delimiter);
                }
            }
        }


You can certainly create a goto solution in C# (note: I didn't add null checks):

string Join(string[] array, string delimiter) {
  var sb = new StringBuilder();
  var enumerator = array.GetEnumerator();
  if (enumerator.MoveNext()) {
    goto start;
    loop:
      sb.Append(delimiter);
      start: sb.Append(enumerator.Current);
      if (enumerator.MoveNext()) goto loop;
  }
  return sb.ToString();
}

For your specific example, this looks pretty straighforward to me (and it's one of the solutions you described):

string Join(string[] array, string delimiter) {
  var sb = new StringBuilder();
  foreach (string element in array) {
    sb.Append(element);
    sb.Append(delimiter);
  }
  if (sb.Length >= delimiter.Length) sb.Length -= delimiter.Length;
  return sb.ToString();
}

If you want to get functional, you can try using this folding approach:

string Join(string[] array, string delimiter) {
  return array.Aggregate((left, right) => left + delimiter + right);
}

Although it reads really nice, it's not using a StringBuilder, so you might want to abuse Aggregate a little to use it:

string Join(string[] array, string delimiter) {
  var sb = new StringBuilder();
  array.Aggregate((left, right) => {
    sb.Append(left).Append(delimiter).Append(right);
    return "";
  });
  return sb.ToString();
}

Or you can use this (borrowing the idea from other answers here):

string Join(string[] array, string delimiter) {
  return array.
    Skip(1).
    Aggregate(new StringBuilder(array.FirstOrDefault()),
      (acc, s) => acc.Append(delimiter).Append(s)).
    ToString();
}


Sometimes I use LINQ .First() and .Skip(1) to handle this ... This can give a relatively clean (and very readable) solution.

Using you example,

append(array.First());
foreach(var x in array.Skip(1))
{
  append(delimiter);
  append (x);
}

[This assumes there is at least one element in the array, an easy test to add if that's to be avoided.]

Use F# would be another suggestion :-)


There are ways you "can" get around the doubled code, but in most cases the duplicated code is much less ugly/dangerous than the possible solutions. The "goto" solution you quote doesn't seem like an improvement to me - I don't really think you really gain anything significant (compactness, readability or efficiency) by using it, while you increase the risk of a programmer getting something wrong at some point in the code's lifetime.

In general I tend to go for the approach:

  • A special case for the first (or last) action
  • loop for the other actions.

This removes the inefficiencies introduced by checking if the loop is in the first iteration on every time around, and is really easy to understand. For non-trivial cases, using a delegate or helper method to apply the action can minimise code duplication.

Or another approach I use sometimes where efficiency isn't important:

  • loop, and test if the string is empty to determine if a delimiter is required.

This can be written to be more compact and readable than the goto approach, and doesn't require any extra variables/storage/tests to detect the "special case" iteraiton.

But I think Mark Byers' approach is a good clean solution for your particular example.


I prefer first variable method. It is probably not cleanest but most efficient way. Alternatively you could use Length of the thing you appending to and compare it to zero. Works well with StringBuilder.


Why don't move dealing with first element outside a loop ?

StringBuilder sb = new StrindBuilder()
sb.append(array.first)
foreach (var elem in array.skip(1)) {
  sb.append(",")
  sb.append(elem)
}


If you want to go the functional route, you could define String.Join like LINQ construct that is reusable across types.

Personally, I would almost always go for code clarity over saving a few opcode executions.

EG:

namespace Play
{
    public static class LinqExtensions {
        public static U JoinElements<T, U>(this IEnumerable<T> list, Func<T, U> initializer, Func<U, T, U> joiner)
        {
            U joined = default(U);
            bool first = true;
            foreach (var item in list)
            {
                if (first)
                {
                    joined = initializer(item);
                    first = false;
                }
                else
                {
                    joined = joiner(joined, item);
                }
            }
            return joined;
        }
    }

    class Program
    {

        static void Main(string[] args)
        {
            List<int> nums = new List<int>() { 1, 2, 3 };
            var sum = nums.JoinElements(a => a, (a, b) => a + b);
            Console.WriteLine(sum); // outputs 6

            List<string> words = new List<string>() { "a", "b", "c" };
            var buffer = words.JoinElements(
                a => new StringBuilder(a), 
                (a, b) => a.Append(",").Append(b)
                );

            Console.WriteLine(buffer); // outputs "a,b,c"

            Console.ReadKey();
        }

    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消