I have several Lists that I need to iterate through in order to perform a calculation. In summary, List1 is List of roadway start and endpoints (ids) and List2 is a List of individual speed samples for those endpoints (there are multiple speed samples for each set of endpoints). List1 is defined like this:
class RoadwaySegment
{
public int StartId {get; set;}
public int EndId {get; set;}
}
List2 is defined like this:
class IndividualSpeeds
{
public int StartHour {get; set;}
public int StartMin {get; set;} //either 0,15,30,or 45
public int Speed {get; set;}
public int StartId {get; set;}
public int EndId {get; set;}
}
List3 is the result of my calculation and will contain the average speeds for the roadway segments in List1 for each 15 minute period of the day. List3 looks like this:
class SummaryData
{
public string SummaryHour {get; set;}
public string SummaryMin {get; set;}
public int StartId {get; set;}
public int EndId {get; set;}
public int AvgSpeed {get; set;}
}
Currently, to generate List3, I iterate over List1, then over each 24 hour period of the day, then over each 15 minute interval of an hour. For each of these iterations, I check to see if the individual speed sample in List2 should be included in the average speed calculation for my roadway segment. So, it looks someth开发者_开发知识库ing like this:
var summaryList = new List<SummaryData>();
foreach (var segment in RoadwaySegments)
{
for(int startHour = 0; startHour < 24; startHour++)
{
for(int startMin = 0; startMin < 60; startMin+= 15)
{
int totalSpeeds = 0;
int numSamples = 0;
int avgSpeed = 0;
foreach(var speedSample in IndividualSpeeds)
{
if((segment.StartId == speedSample.StartId)&&(segment.EndId == speedSample.EndId)&&(speedSample.StartHour == startHour)&&(speedSample.StartMin == startMin))
{
if(speedSample.Speed > 0)
{
totalSpeeds += speedSample.Speed;
numSamples += 1;
}
}
}
SummaryData summaryItem = new SummaryData {SummaryHour = startHour, SummaryMin = startMin, StartId = segment.StartId, EndId = segment.EndId, AvgSpeed = totalSpeeds/numSamples;
summaryList.Add(summaryItem);
}
}
}
The issue with this code is that List1 might have a hundred roadway segments but List2 can contain a million or more speed sample records so sub-iterations of the list are very time consuming. Is there a way to use GroupBy/LINQ to improve the performance and readability of this code? Note the condition for including a speed in the average--it has to be greater than 0.
This is untested, but I think it will work:
from segment in RoadwaySegments
join sample in IndividualSpeeds on new { segment.StartId, segment.EndId } equals new { sample.StartId, sample.EndId } into segmentSamples
from startHour in Enumerable.Range(0, 24)
from startMin in new[] { 0, 15, 30, 45 }
let startMinSamples = segmentSamples
.Where(sample => sample.Speed > 0)
.Where(sample => sample.StartHour == startHour)
.Where(sample => sample.StartMin == startMin)
.Select(sample => sample.Speed)
.ToList()
select new SummaryItem
{
StartId = segment.StartId,
EndId = segment.EndId,
SummaryHour = startHour,
SummaryMin = minute,
AvgSpeed = startMinSamples.Count <= 2 ? 0 : startMinSamples.Average()
};
The main idea is to iterate the segment and sample lists once, producing a group of samples for each segment. Then, for each of those groups, you generate the hours and minutes and a summary item for each combination. Finally, you calculate the average speed of all non-zero samples in the hour/minute combination.
This isn't quite ideal because you still iterate the segment's samples 24 * 4 times, but it is much better than iterating the entire sample list. This should get you on the right path, and hopefully you can optimize that last bit further.
If you are using .net 4, I would suggest parallelizing this with Parallel Linq. This is embarrassingly parallel over RoadwaySegments.
Secondly, instead of your nested iteration over the child lists, I would recommend iterating that list once and creating a Dictionary of List<IndividualSpeeds>
with a Composite Key of StartId, EndId, StartHour, and EndHour. Doing a lookup in this Dictionary will be much faster vs re-iterating over that for each RoadwaySegment.
The following is supplemantery to Chris and Bryan answers:
Since you're saying there could be millions of speed sample records, you just don't want to iterate them more than the minimum possible - i.e. you should group (using GroupBy
or Join
operators) them, according to their start hour and minute. Than you could just iterate each group, and add each sample record into some dictionary of RoadwaySegments (something like Dictionary<RoadwaySegment, IEnumerable<IndividalSpeeds>>
) and create your Summary items from this dictionary.
精彩评论