I have a for loop that does 24 total iterations each representing a single hour of the day and then checks each 15 minute interval in another nested for loop. An additional nest checks a List for the hour and minute value and then aggregates some of the items in my List if they meet my time requirement. The issue is that my List can contain up to 1 million records which means that I traverse 1 million records 24*4 times.
How can I optimize my code for faster performance in this case? I know this could probably be simplified with LINQ statements but I'm not sure it would make it faster. Here's an example of what I am doing.
List<SummaryData> Aggregates = new List<SummaryData>();
for(int startHour = 0; startHour < 24; startHour++)
{
for(int startMin = 0; startMin < 60; startMin+= 15)
{
int aggregateData = 0;
//My ItemList can have up to 1 million records.
foreach(ListItem item in ItemList)
{
if((item.time.Hour == startHour)&&(item.time.Minute == startMinute))
{
aggregateData += item.number;
}
}
SummaryData aggregate = new SummaryData { SummaryId = item.id, TotalNumber = aggregateData
Aggregates.Add(aggregate);
}
}
class SummaryData
{
public int SummaryId {get; set;}
public int TotalNumber {get; set;}
开发者_如何学Python}
Instead of looking for each Hour
and Minute
in every single item
, iterate over ItemList
just once and act based on each item.time.Hour
and item.time.Minute
.
Given your logic above, you should only have to iterate the list one time. You can nest your for
loops within the foreach
and likely achieve better performance. I would also use a Dictionary
to hold your aggregate data, and base its key on the total minute (meaning hour * 60 + minute
).
Dictionary<int, AggregateDate> aggregate = new Dictionary<int, AggregateData>();
foreach(ListItem item in ItemList)
{
int key = item.Hour * 60 + item.Minute;
AggregateData data;
if(!aggregate.TryGetValue(key, out data))
{
aggregate.Add(key, data = new AggregateData());
}
data.Number += item.Number;
}
I'd be organizing the data roughly like this:
(see also: http://ideone.com/dyfoD)
using System;
using System.Linq;
using System.Collections.Generic;
public class P
{
struct DataItem
{
public System.DateTime time;
public int number;
}
public static void Main(string[] args)
{
var ItemList = new DataItem[] {} ;
var groups = ItemList
.GroupBy(item => item.time.Hour * 60 + (item.time.Minute/15)*15 );
var sums = groups
.ToDictionary(g => g.Key, g => g.Sum(item => item.number));
// lookups now become trivially easy:
int slot1900 = sums[1900];
int slot1915 = sums[1915];
int slot1930 = sums[1930];
}
}
What is the result of this algorithm? Apologies if I'm being daft for not getting it.
It seems to indentify all items in itemList whose minute value is evenly divisible by 15, then add its number value into a running counter, and then add that running counter into this Aggregates object.
Because I'm not clear on the types of some of these objects, I'm a little fuzzy on what's actually happening here. You seem to aggregate once with "aggregateData += item.number" and then aggregate AGAIN with "Aggregates.Add(aggregateData)" are you sure you're not double-summing these things? I'm not even clear if you're trying to sum values of qualified items or create a list of them.
That aside, it's definitely not required or optimal to go over the entire list of 1 million items 24*4 times, but I can't be sure what is correct without a clearer understanding of the goal.
As suggested in the other answers, the correct approach is likely to iterate over itemList exactly once and operate on every single item, rather than iterating ~100 times and discarding each item in the list ~99 times (since you know it can only qualify for one of the ~100 iterations).
Your problem statement is a wee bit fuzzy. It looks like you want a summary, by item id, giving you the sum of all item numbers where the timestamp falls on a integral quarter-hour boundary.
The following should do the trick, I think.
- one pass through the list
- the datastore is a SortedDictionary (a height balanced binary tree), so lookup, insertion and removal is O(log N).
Here's the code:
public class SummaryData
{
public SummaryData( int id )
{
this.SummaryId = id ;
this.TotalNumber = 0 ;
}
public int SummaryId { get; set; }
public int TotalNumber { get; set; }
}
public class ListItem
{
public int Id ;
public int Number ;
public DateTime Time ;
}
public IEnumerable<SummaryData> Summarize( IEnumerable<ListItem> ItemList )
{
const long TICKS_PER_QUARTER_HOUR = TimeSpan.TicksPerMinute * 15;
SortedDictionary<int,SummaryData> summary = new SortedDictionary<int , SummaryData>();
foreach ( ListItem item in ItemList )
{
long TimeOfDayTicks = item.Time.TimeOfDay.Ticks;
bool on15MinuteBoundary = ( 0 == TimeOfDayTicks % TICKS_PER_QUARTER_HOUR ? true : false );
if ( on15MinuteBoundary )
{
int key = (int)( TimeOfDayTicks / TICKS_PER_QUARTER_HOUR );
SummaryData value;
bool hasValue = summary.TryGetValue( key , out value );
if ( !hasValue )
{
value = new SummaryData( item.Id );
summary.Add( value.SummaryId , value ) ;
}
value.TotalNumber += item.Number;
}
}
return summary.Values;
}
精彩评论