开发者

Is this the best way to create a frequency table using LINQ?

开发者 https://www.devze.com 2023-01-06 22:04 出处:网络
I want to write a function that reads a file and counts the number of times each word occurs. Assuming the file-reading is handled and produces a list of strings representing each line in the file, I

I want to write a function that reads a file and counts the number of times each word occurs. Assuming the file-reading is handled and produces a list of strings representing each line in the file, I need a function to count the occurrence of each word. Firstly, is using a Dictionary<string,int> the best approach? The key is the word, and the value is the number of occurrences of that word.

I wrote this function which iterates through each line and each word in a line and builds up a dictionary:

static IDictionary<string, int> CountWords(IEnumerable<string> lines)
var dict = new Dictionary<string, int>();
foreach (string line in lines)
{
    string[] words = line.Split(' ');
    foreach (string word in words)
    {
        if (dict.ContainsKey(word))
            dict[word]++;
        else
            dict.Add(word, 1);
    }
}

However, I would like to somehow write this function.. functionally, using LINQ (because LINQ is fun and I'm trying to improve my functional programming skills :D) I managed to come up with this expresion, but I'm not sure whether it's the best way to do it functionally:

static IDictionary<string, int> CountWords2(IEnumerable<string> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .Aggregate(new Dictionary<string, int>(),
            (dict, word) =>
            {
                if (dict.ContainsKey(word))
                    dict[word]++;
    开发者_开发技巧            else
                    dict.Add(word, 1);
                return dict;
            });
}

So while I have two working solutions, I am also interested in learning what the best approach is to this problem. Anyone with insight on LINQ and FP?


As Tim Robinson wrote you could use GroupBy with ToDictionary like this

    public static Dictionary<string, int> CountWords3(IEnumerable<string> strings)
    {
        return strings.SelectMany(s => s.Split(' ')).GroupBy(w=>w).ToDictionary(g => g.Key, g => g.Count());
    }


Take a look at GroupBy instead of Aggregate -- it will give you a set of IGrouping<string, string> objects. You'll be able to retrieve the count of each word by calling .Count() on each grouping.


The following should do the job.

static IDictionary<String, Int32> CountWords(IEnumerable<String> lines)
{
    return lines
        .SelectMany(line => line.Split(' '))
        .GroupBy(word => word)
        .ToDictionary(group => group.Key, group => group.Count());
}


if you want to use linq (and not use the extension methods used by linq firectly) you can write:

var groups = from line in lines
             from s in line.Split(new []{"\t", " "},StringSplitOptions.RemoveEmptyEntries) 
             group s by s into g
             select g;
var dic = groups.ToDictionary(g => g.Key,g=>g.Count());

your current implementation won't split on tab and might include the "word" string.Empty so I've changed the split in accordance to what I think your intentions are.

0

精彩评论

暂无评论...
验证码 换一张
取 消