I want to write a function that reads a file and counts the number of times each word occurs. Assuming the file-reading is handled and produces a list of strings representing each line in the file, I need a function to count the occurrence of each word. Firstly, is using a Dictionary<string,int>
the best approach? The key is the word, and the value is the number of occurrences of that word.
I wrote this function which iterates through each line and each word in a line and builds up a dictionary:
static IDictionary<string, int> CountWords(IEnumerable<string> lines)
var dict = new Dictionary<string, int>();
foreach (string line in lines)
{
string[] words = line.Split(' ');
foreach (string word in words)
{
if (dict.ContainsKey(word))
dict[word]++;
else
dict.Add(word, 1);
}
}
However, I would like to somehow write this function.. functionally, using LINQ (because LINQ is fun and I'm trying to improve my functional programming skills :D) I managed to come up with this expresion, but I'm not sure whether it's the best way to do it functionally:
static IDictionary<string, int> CountWords2(IEnumerable<string> lines)
{
return lines
.SelectMany(line => line.Split(' '))
.Aggregate(new Dictionary<string, int>(),
(dict, word) =>
{
if (dict.ContainsKey(word))
dict[word]++;
开发者_开发技巧 else
dict.Add(word, 1);
return dict;
});
}
So while I have two working solutions, I am also interested in learning what the best approach is to this problem. Anyone with insight on LINQ and FP?
As Tim Robinson wrote you could use GroupBy
with ToDictionary
like this
public static Dictionary<string, int> CountWords3(IEnumerable<string> strings)
{
return strings.SelectMany(s => s.Split(' ')).GroupBy(w=>w).ToDictionary(g => g.Key, g => g.Count());
}
Take a look at GroupBy
instead of Aggregate
-- it will give you a set of IGrouping<string, string>
objects. You'll be able to retrieve the count of each word by calling .Count()
on each grouping.
The following should do the job.
static IDictionary<String, Int32> CountWords(IEnumerable<String> lines)
{
return lines
.SelectMany(line => line.Split(' '))
.GroupBy(word => word)
.ToDictionary(group => group.Key, group => group.Count());
}
if you want to use linq (and not use the extension methods used by linq firectly) you can write:
var groups = from line in lines
from s in line.Split(new []{"\t", " "},StringSplitOptions.RemoveEmptyEntries)
group s by s into g
select g;
var dic = groups.ToDictionary(g => g.Key,g=>g.Count());
your current implementation won't split on tab and might include the "word" string.Empty so I've changed the split in accordance to what I think your intentions are.
精彩评论