开发者

Group lines in an input file with LINQ or some other means in C#

开发者 https://www.devze.com 2023-03-27 11:04 出处:网络
I need some advice on how I can change the i开发者_C百科mplementation of some code. I have input files that hold data that\'s read into a class:

I need some advice on how I can change the i开发者_C百科mplementation of some code. I have input files that hold data that's read into a class:

This is line 1
This is line 2 
This is line 3

I have code that writes out this data to HTML

@foreach (var item in Model.NoteDetails)
{
    <li>@item</li>
} 

Now I am being asked to accept a different kind of input file:

This is line 1
Data Line - two
Data Line - two two
Another line

What I need to do is to have my code check for lines that start with the same text just when followed by a hyphen. Then the code should group by this. So the output resulting from this would need to be:

This is line 1

Data Line
- two
- two two

Another line

I am not sure if this is possible with LINQ. Maybe it is better if I use an array or something. The problem is I need to be able to look ahead and then when I find that a line's not in a group then I print out the lines and do or do not start another group. I think it's more than is possible with LINQ. I know there are a lot of LINQ experts out there and would appreciate any advice.


Using the ChunkBy extension method on MSDN (any my own simple ToEnumerable extension method) you can do this with LINQ. End product looks nice, but there's a lot of extension method magic helping things along:

void Main()
{
    var data=
@"This is line 1
Data Line - two
Data Line - two two
Another line";
    var lines = data.Split(new[] {"\r\n", "\n"}, StringSplitOptions.None);
    var sep = " - ";
    var linesAndKeys=
        lines
            .Select(line => new {
                line, 
                parts = line.Split(new[] {sep}, StringSplitOptions.None)})
            .Select(x=>new {
                line = x.parts.Length>1
                    ? string.Join(sep, x.parts.Skip(1))
                    : x.line,
                key = x.parts.Length>1
                    ? x.parts[0]
                    : String.Empty
            });
    var transformedLines=
        linesAndKeys
            .ChunkBy(i => i.key)
            .Select(c =>
                c.Key == String.Empty
                    ? c.Select(s => s.line)
                    : c.Key.ToEnumerable().Concat(c.Select(s=>" - "+s.line)))
            .Interleave(() => Environment.NewLine.ToEnumerable())
            .SelectMany(x => x);

    var newString = string.Join(Environment.NewLine, transformedLines);

    Console.WriteLine(newString);

}

public static class MyExtensions
{
public static IEnumerable<T> 
    Interleave<T>(this IEnumerable<T> src, Func<T> separatorFactory)
{
    var srcArr = src.ToArray();
    for (int i = 0; i < srcArr.Length; i++)
    {
        yield return srcArr[i];
        if(i<srcArr.Length-1)
        {
            yield return separatorFactory();
        }
    }
}
    public static IEnumerable<T> ToEnumerable<T>(this T item)
    {
        yield return item;
    }
    public static IEnumerable<IGrouping<TKey, TSource>> ChunkBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        return source.ChunkBy(keySelector, EqualityComparer<TKey>.Default);
    }

    public static IEnumerable<IGrouping<TKey, TSource>> ChunkBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
    {
        // Flag to signal end of source sequence.
        const bool noMoreSourceElements = true;

        // Auto-generated iterator for the source array.       
        var enumerator = source.GetEnumerator();

        // Move to the first element in the source sequence.
        if (!enumerator.MoveNext()) yield break;

        // Iterate through source sequence and create a copy of each Chunk.
        // On each pass, the iterator advances to the first element of the next "Chunk"
        // in the source sequence. This loop corresponds to the outer foreach loop that
        // executes the query.
        Chunk<TKey, TSource> current = null;
        while (true)
        {
            // Get the key for the current Chunk. The source iterator will churn through
            // the source sequence until it finds an element with a key that doesn't match.
            var key = keySelector(enumerator.Current);

            // Make a new Chunk (group) object that initially has one GroupItem, which is a copy of the current source element.
            current = new Chunk<TKey, TSource>(key, enumerator, value => comparer.Equals(key, keySelector(value)));

            // Return the Chunk. A Chunk is an IGrouping<TKey,TSource>, which is the return value of the ChunkBy method.
            // At this point the Chunk only has the first element in its source sequence. The remaining elements will be
            // returned only when the client code foreach's over this chunk. See Chunk.GetEnumerator for more info.
            yield return current;

            // Check to see whether (a) the chunk has made a copy of all its source elements or 
            // (b) the iterator has reached the end of the source sequence. If the caller uses an inner
            // foreach loop to iterate the chunk items, and that loop ran to completion,
            // then the Chunk.GetEnumerator method will already have made
            // copies of all chunk items before we get here. If the Chunk.GetEnumerator loop did not
            // enumerate all elements in the chunk, we need to do it here to avoid corrupting the iterator
            // for clients that may be calling us on a separate thread.
            if (current.CopyAllChunkElements() == noMoreSourceElements)
            {
                yield break;
            }
        }
    }

    // A Chunk is a contiguous group of one or more source elements that have the same key. A Chunk 
    // has a key and a list of ChunkItem objects, which are copies of the elements in the source sequence.
    class Chunk<TKey, TSource> : IGrouping<TKey, TSource>
    {
        // INVARIANT: DoneCopyingChunk == true || 
        //   (predicate != null && predicate(enumerator.Current) && current.Value == enumerator.Current)

        // A Chunk has a linked list of ChunkItems, which represent the elements in the current chunk. Each ChunkItem
        // has a reference to the next ChunkItem in the list.
        class ChunkItem
        {
            public ChunkItem(TSource value)
            {
                Value = value;
            }
            public readonly TSource Value;
            public ChunkItem Next = null;
        }
        // The value that is used to determine matching elements
        private readonly TKey key;

        // Stores a reference to the enumerator for the source sequence
        private IEnumerator<TSource> enumerator;

        // A reference to the predicate that is used to compare keys.
        private Func<TSource, bool> predicate;

        // Stores the contents of the first source element that
        // belongs with this chunk.
        private readonly ChunkItem head;

        // End of the list. It is repositioned each time a new
        // ChunkItem is added.
        private ChunkItem tail;

        // Flag to indicate the source iterator has reached the end of the source sequence.
        internal bool isLastSourceElement = false;

        // Private object for thread syncronization
        private object m_Lock;

        // REQUIRES: enumerator != null && predicate != null
        public Chunk(TKey key, IEnumerator<TSource> enumerator, Func<TSource, bool> predicate)
        {
            this.key = key;
            this.enumerator = enumerator;
            this.predicate = predicate;

            // A Chunk always contains at least one element.
            head = new ChunkItem(enumerator.Current);

            // The end and beginning are the same until the list contains > 1 elements.
            tail = head;

            m_Lock = new object();
        }

        // Indicates that all chunk elements have been copied to the list of ChunkItems, 
        // and the source enumerator is either at the end, or else on an element with a new key.
        // the tail of the linked list is set to null in the CopyNextChunkElement method if the
        // key of the next element does not match the current chunk's key, or there are no more elements in the source.
        private bool DoneCopyingChunk { get { return tail == null; } }

        // Adds one ChunkItem to the current group
        // REQUIRES: !DoneCopyingChunk && lock(this)
        private void CopyNextChunkElement()
        {
            // Try to advance the iterator on the source sequence.
            // If MoveNext returns false we are at the end, and isLastSourceElement is set to true
            isLastSourceElement = !enumerator.MoveNext();

            // If we are (a) at the end of the source, or (b) at the end of the current chunk
            // then null out the enumerator and predicate for reuse with the next chunk.
            if (isLastSourceElement || !predicate(enumerator.Current))
            {
                enumerator = null;
                predicate = null;
            }
            else
            {
                tail.Next = new ChunkItem(enumerator.Current);
            }

            // tail will be null if we are at the end of the chunk elements
            // This check is made in DoneCopyingChunk.
            tail = tail.Next;
        }

        // Called after the end of the last chunk was reached. It first checks whether
        // there are more elements in the source sequence. If there are, it 
        // Returns true if enumerator for this chunk was exhausted.
        internal bool CopyAllChunkElements()
        {
            while (true)
            {
                lock (m_Lock)
                {
                    if (DoneCopyingChunk)
                    {
                        // If isLastSourceElement is false,
                        // it signals to the outer iterator
                        // to continue iterating.
                        return isLastSourceElement;
                    }
                    else
                    {
                        CopyNextChunkElement();
                    }
                }
            }
        }

        public TKey Key { get { return key; } }

        // Invoked by the inner foreach loop. This method stays just one step ahead
        // of the client requests. It adds the next element of the chunk only after
        // the clients requests the last element in the list so far.
        public IEnumerator<TSource> GetEnumerator()
        {
            //Specify the initial element to enumerate.
            ChunkItem current = head;

            // There should always be at least one ChunkItem in a Chunk.
            while (current != null)
            {
                // Yield the current item in the list.
                yield return current.Value;

                // Copy the next item from the source sequence, 
                // if we are at the end of our local list.
                lock (m_Lock)
                {
                    if (current == tail)
                    {
                        CopyNextChunkElement();
                    }
                }

                // Move to the next ChunkItem in the list.
                current = current.Next;
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

EDIT

I added another extension method (Interleave). This is used to interleave new lines where necessary. Output now fully matches your requirements.

How it works:

First we need keys. If there's a dash, use what's before the first dash, otherwise use string.Empty.

This gives us:

line           | key 
--------------------------
This is line 1 |
two            | Data Line   
two two        | Data Line   
Another line   |

Then when we ChunkBy we've got 3 groups, (line 1), (line 2,line 3) and (line 4). Each group also has a Key.

We can now use this info to reassemble the data in the required format.


One option to obtaining this result in LINQ is to use an aggregate expression. This is possible in this case since you can identify the next line of the output from the prior lines without knowledge of subsequent lines in the input. This sample outputs as a string, but can be adapted to return other data types if needed.

// let IEnumerable<string> lines be
// a list of lines in the input.
string s = lines
    .Aggregate(
    // seed accumulator with anonymous class 
    // including StringBuilder and the last
    // header group
    new {
        builder = new StringBuilder(),
        lastheader = new string[1] 
    },
    // accumulator function for each item
    (acc, line) =>
    {
        var parts = line.Split(new char[] {'-'}, 2);
        string headerpart = parts[0];
        string itempart = parts.Length == 1 ? null : line.Substring(line.IndexOf('-'));
        bool firstline = acc.builder.Length == 0;
        // Case 1: The line contains no hyphen
        if (parts.Length == 1)
        {
            if (!firstline) acc.builder.AppendLine();
            acc.builder.AppendLine(line);
            acc.lastheader[0] = null;
        }
        // Case 2: The line contains a hyphen, and
        // the header is the same as the header on the previous
        // line.  Only output the item.
        else if (acc.lastheader[0] == parts[0])
        {
             acc.builder.AppendLine(itempart);
        }
        // Case 3: The line contains a hyphen, and
        // the header is not the same as the header on the previous
        // line.  Output the header on one line, and then the item
        // on the subsequent line.
        else
        {
            acc.lastheader[0] = headerpart;
            if (!firstline) acc.builder.AppendLine();
            acc.builder.AppendLine(headerpart);
            acc.builder.AppendLine(itempart);
        }
        // Finally, return the mutated accumulator.
        return acc;
    },
    // When finished, convert the builder to a string
    // and return the complete string.
    acc => acc.builder.ToString());


Does it have to be LINQ? A simple iterative function could do the job. (Sorry the Ex. is in Java, I don't have VS in front of me)

    //im using string here, but easy enough to convert to file reader
    String[] l = str.split("\r\n");
    String curr = l[0]; //current line 
    for (int i = 1; i < l.length; i++) {
        //next line
        String peek = l[i];     

        String[] lcurr = curr.split("-");
        String[] lpeek = peek.split("-");
        int c = 0; //count of grouped lines
        if (lcurr.length == 2 && lpeek.length == 2 && lcurr[0].equals(lpeek[0])) {
            //print group header prepended with blank line
            System.out.println("\r\n" + lcurr[0]);
            //print first grouped item
            System.out.println("- " + lcurr[1]);
            c++;

            //advance position
            lcurr = lpeek;
            //as long as there is one ahead to peek
            if (i < l.length - 1) {
                lpeek = l[++i].split("-"); 
                //continue printing remaining groups
                while (i < l.length && lcurr.length == 2 && lpeek.length == 2 && lcurr[0].equals(lpeek[0])) {
                    System.out.println("- " + lcurr[1]);
                    lcurr = lpeek;
                    lpeek = l[++i].split("-");
                    c++;
                }   
            }
        }

        //print last grouped item
        if (c > 0) {
            System.out.println("- " + lcurr[1] + "\r\n");
        }
        else
            System.out.println(curr);

        curr = l[i];
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号