Recommended behaviour of GetEnumerator() when implementing IEnumerable<T> and IEnumerator<T>_问答_开发者

I am implementing my own enumerable type. Something ressembling this:

public class LineReaderEnumerable : IEnumerable<string>开发者_运维问答;, IDisposable
{
    private readonly LineEnumerator enumerator;

    public LineReaderEnumerable(FileStream fileStream)
    {
        enumerator = new LineEnumerator(new StreamReader(fileStream, Encoding.Default));
    }

    public IEnumerator<string> GetEnumerator()
    {
        return enumerator;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    public void Dispose()
    {
       enumerator.Dispose();
    }
}

The enumerator class:

public class LineEnumerator : IEnumerator<string>
{
    private readonly StreamReader reader;
    private string current;

    public LineEnumerator(StreamReader reader)
    {
        this.reader = reader;
    }

    public void Dispose()
    {
        reader.Dispose();
    }

    public bool MoveNext()
    {
        if (reader.EndOfStream)
        {
            return false;
        }
        current = reader.ReadLine();
        return true;
    }

    public void Reset()
    {
        reader.DiscardBufferedData();
        reader.BaseStream.Seek(0, SeekOrigin.Begin);
        reader.BaseStream.Position = 0;
    }

    public string Current
    {
        get { return current; }
    }

    object IEnumerator.Current
    {
        get { return Current; }
    }
}

My question is this: should I call Reset() on the enumerator when GetEnumerator() is called or is it the responsability of the calling method (like foreach) to do it?

Should GetEnumerator() create a new one, or is it supposed to always return the same instance?

Your model is fundamentally broken - you should create a new IEnumerator<T> each time GetEnumerator() is called. Iterators are meant to be independent of each other. For example, I ought to be able to write:

var lines = new LinesEnumerable(...);
foreach (var line1 in lines)
{
    foreach (var line2 in lines)
    {
        ...
    }
}

and basically get the cross-product of each line in the file against each of the other lines.

This means LineEnumerable class should not be given a FileStream - it should be given something which can be used to obtain a FileStream each time you need one, e.g. a filename.

For example, you can do all of this in a single method call using iterator blocks:

// Like File.ReadLines in .NET 4 - except that's broken (see comments)
public IEnumerable<string> ReadLines(string filename)
{
    using (TextReader reader = File.OpenText(filename))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

Then:

var lines = ReadLines(filename);
// foreach loops as before

... that will work fine.

EDIT: Note that certain sequences are naturally iterable only once - e.g. a network stream, or a sequence of random numbers from an unknown seed.

Such sequences are really better expressed as IEnumerator<T> rather than IEnumerable<T>, but that makes filtering etc with LINQ harder. IMO such sequences should at least throw an exception on the second call to GetEnumerator() - returning the same iterator twice is a really bad idea.

The expectation of a user of your type would be that GetEnumerator() returns a new enumerator object.

As you have defined it every call to GetEnumerator returns the same enumerator, so code like:

var e1 = instance.GetEnumerator();
e1.MoveNext();
var first = e1.Value();

var e2 = instance.GetEnumerator();
e2.MoveNext();
var firstAgain = e2.Value();

Debug.Assert(first == firstAgain);

will not work as expected.

(An internal call to Reset would be an unusual design, but that's secondary here.)

Additional: PS If you want an enumerator over the lines of a file then use File.ReadLines, but it appears (see comments on Jon Skeet's answer) this suffers from the same problem as your code.

Should GetEnumerator() create a new one, or is it supposed to always return the same instance?

If you return the same instance then the second iteration will be returning results from the point where the first iteration is and both of them will interfere with each other if the code is executing alternatively or in parallel. so No you shouldn't return same instance.

For Reset

An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and the next call to the MoveNext or Reset method throws an InvalidOperationException.

The Reset method is provided for COM interoperability. It does not necessarily need to be implemented; instead, the implementer can simply throw a NotSupportedException.

http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx

My question is this: should I call Reset() on the enumerator when GetEnumerator() is called or is it the responsability of the calling method (like foreach) to do it?

That is the responsability of the calling method; However if your enumerator is invalid before a first call to Reset() you should of course call it before returning it (that would be an implementation detail).

In normal operation, an enumerator is never actually reset. You can verify that by throwing NotSupportedException from within reset.

Should GetEnumerator() create a new one, or is it supposed to always return the same instance?

Yes it should always return a new instance. Think of it this way: an Enumerable is something that you can enumerate. Enumerator is the thing that you use to enumerate with. If GetEnumerator() always returned the same instance, the containing class would not be 'enumerable' but just know how to 'enumerate' (IOW: it would just be IHasEnumerator instead of IEnumerable)

As far as I'm concerned, it should be the responsibility of the caller. This follows from POLA (the principle of least astonishment, if you like. And indeed, you don't want your reader to control too much. Consider, what if the consumer wants only to enumerate lines from a certain point in the stream onwards?

And regarding the Reset method itself, you should really check to see if the stream is actually seekable before trying to seek -- many streams aren't (e.g. network streams).