开发者

Tricky string transformation (hopefully) in LINQ

开发者 https://www.devze.com 2022-12-24 16:15 出处:网络
I\'m hoping for a concise way to perform the following transformation.I want to transform song lyrics.The input will look something like this:

I'm hoping for a concise way to perform the following transformation. I want to transform song lyrics. The input will look something like this:

Verse 1 lyrics line 1
Verse 1 lyrics line 2
Verse 1 lyrics line 3
Verse 1 lyrics line 4

Verse 2 lyrics line 1
Verse 2 lyrics line 2
Verse 2 lyrics line 3
Verse 2 lyrics line 4

And I want to transform them so the first line of each verse is grouped together as in:

Verse 1 lyrics line 1
Verse 2 lyrics line 1

Verse 1 开发者_C百科lyrics line 2
Verse 2 lyrics line 2

Verse 1 lyrics line 3
Verse 2 lyrics line 3

Verse 1 lyrics line 4
Verse 2 lyrics line 4

Lyrics will obviously be unknown, but the blank line marks a division between verses in the input.


I have a few extension methods I always keep around that make this type of processing very simple. The solution in its entirety is going to be longer than others, but these are useful methods to have around, and once you have the extension methods in place then the answer is very short and easy-to-read.

First, there's a Zip method that takes an arbitrary number of sequences:

public static class EnumerableExtensions
{
    public static IEnumerable<T> Zip<T>(
        this IEnumerable<IEnumerable<T>> sequences,
        Func<IEnumerable<T>, T> aggregate)
    {
        var enumerators = sequences.Select(s => s.GetEnumerator()).ToArray();
        try
        {
            while (enumerators.All(e => e.MoveNext()))
            {

                var items = enumerators.Select(e => e.Current);
                yield return aggregate(items);
            }
        }
        finally
        {
            foreach (var enumerator in enumerators)
            {
                enumerator.Dispose();
            }
        }
    }
}

Then there's a Split method which does roughly the same thing to an IEnumerable<T> that string.Split does to a string:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
    Predicate<T> splitCondition)
{
    using (IEnumerator<T> enumerator = items.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            yield return GetNextItems(enumerator, splitCondition).ToArray();
        }
    }
}

private static IEnumerable<T> GetNextItems<T>(IEnumerator<T> enumerator,
    Predicate<T> stopCondition)
{
    do
    {
        T item = enumerator.Current;
        if (stopCondition(item))
        {
            yield break;
        }
        yield return item;
    } while (enumerator.MoveNext());
}

Once you have these extensions in place, solving the song-lyric problem is a piece of cake:

string lyrics = ...
var verseGroups = lyrics
    .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
    .Select(s => s.Trim())  // Optional, if there might be whitespace
    .Split(s => string.IsNullOrEmpty(s))
    .Zip(seq => string.Join(Environment.NewLine, seq.ToArray()))
    .Select(s => s + Environment.NewLine);  // Optional, add space between groups


LINQ is so sweet... I just love it.

static void Main(string[] args)
{
    var lyrics = @"Verse 1 lyrics line 1 
                   Verse 1 lyrics line 2 
                   Verse 1 lyrics line 3 
                   Verse 1 lyrics line 4 

                   Verse 2 lyrics line 1 
                   Verse 2 lyrics line 2 
                   Verse 2 lyrics line 3 
                   Verse 2 lyrics line 4";
    var x = 0;
    var indexed = from lyric in lyrics.Split(new[] { Environment.NewLine },
                                             StringSplitOptions.None)
                  let line = lyric.Trim()
                  let indx = line == string.Empty ? x = 0: ++x
                  where line != string.Empty
                  group line by indx;

    foreach (var trans in indexed)
    {
        foreach (var item in trans)
            Console.WriteLine(item);
        Console.WriteLine();
    }
    /*
        Verse 1 lyrics line 1
        Verse 2 lyrics line 1

        Verse 1 lyrics line 2
        Verse 2 lyrics line 2

        Verse 1 lyrics line 3
        Verse 2 lyrics line 3

        Verse 1 lyrics line 4
        Verse 2 lyrics line 4
     */
}


There is probably a more concise way to do this, but here's one solution that works given valid input:

        var output = String.Join("\r\n\r\n", // join it all in the end
        Regex.Split(input, "\r\n\r\n") // split on blank lines
            .Select(v => Regex.Split(v, "\r\n")) // now split lines in each verse
            .SelectMany(vl => vl.Select((lyrics, i) => new { Line = i, Lyrics = lyrics })) // flatten things out, but attach line number
            .GroupBy(b => b.Line).Select(c => new { Key = c.Key, Value = c }) // group by line number
            .Select(e => String.Join("\r\n", e.Value.Select(f => f.Lyrics).ToArray())).ToArray());

Obviously this is pretty ugly. Not at all a suggestion for production code.


Take your input as one large string. Then determine the number of lines in a verse.

Use .Split to get an array of strings, each item is now a line. Then loop through the number of lines you have and use stringbuilder to append SplitStrArray(i) and SplitStrArray(i+lines in a verse).

I think that will be the best approach. I'm not saying LINQ isn't awesome, but it seems silly to say, 'I have a problem and I want to use this tool to solve it'.

"I have to get a screw into the wall - but I want to use a hammer". If you are determined, you'll probably find a way to use the hammer; but IMHO, that's not the best course of action. Maybe someone else will have a really awesome LINQ example that makes it super easy and I'll feel silly for posting this....


Give this a try. Regex.Split is used to prevent the extra blank entries String.Split can be used to determine where the first blank line occurs with the help of the Array.FindIndex method. This indicates the number of verses available between each blank line (given the format is consistent of course). Next, we filter the blank lines out and determine each line's index and group them by the modulus of the aforementioned index.

string input = @"Verse 1 lyrics line 1
Verse 1 lyrics line 2
Verse 1 lyrics line 3
Verse 1 lyrics line 4
Verse 1 lyrics line 5

Verse 2 lyrics line 1
Verse 2 lyrics line 2
Verse 2 lyrics line 3
Verse 2 lyrics line 4
Verse 2 lyrics line 5

Verse 3 lyrics line 1
Verse 3 lyrics line 2
Verse 3 lyrics line 3
Verse 3 lyrics line 4
Verse 3 lyrics line 5
";

// commented original Regex.Split approach
//var split = Regex.Split(input, Environment.NewLine);
var split = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
// find first blank line to determine # of verses
int index = Array.FindIndex(split, s => s == "");
var result = split.Where(s => s != "")
                  .Select((s, i) => new { Value = s, Index = i })
                  .GroupBy(item => item.Index % index);

foreach (var group in result)
{
    foreach (var item in group)
    {
        Console.WriteLine(item.Value);
    }        
    Console.WriteLine();
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号