I'm calling a worker method that calls to the database that then iterates and yield returns values for parallel processing. To prevent it from hammering the database, I have a Thread.Sleep in there to pause the execution to the DB. However, this appears to be blocking executions that are still开发者_如何学Python occurring in the Parallel.ForEach. What is the best way to achieve this to prevent blocking?
private void ProcessWorkItems()
{
_cancellation = new CancellationTokenSource();
_cancellation.Token.Register(() => WorkItemRepository.ResetAbandonedWorkItems());
Task.Factory.StartNew(() =>
Parallel.ForEach(GetWorkItems().AsParallel().WithDegreeOfParallelism(10), workItem =>
{
var x = ItemFactory(workItem);
x.doWork();
}), _cancellation.Token);
}
private IEnumerable<IAnalysisServiceWorkItem> GetWorkItems()
{
while (!_cancellation.IsCancellationRequested)
{
var workItems = WorkItemRepository.GetItemList(); //database call
workItems.ForEach(item =>
{
item.QueueWorkItem(WorkItemRepository);
});
foreach (var item in workItems)
{
yield return item;
}
if (workItems.Count == 0)
{
Thread.Sleep(30000); //sleep this thread for 30 seconds if no work items.
}
}
yield break;
}
Edit: I changed it to include the answer and it's still not working as I'm expecting. I added the .AsParallel().WithDegreeOfParallelism(10) to the GetWorkItems() call. Are my expectations incorrect when I think that Parallel should continue to execute even though the base thread is sleeping?
Example: I have 15 items, it iterates and grabs 10 items and starts them. As each one finishes, it asks for another one from GetWorkItems until it tries to ask for a 16th item. At that point it should stop trying to grab more items but should continue processing items 11-15 until those are complete. Is that how parallel should be working? Because it's not currently doing that. What it's currently doing is when it completes 6, it locks the subsequent 10 still being run in the Parallel.ForEach.
I would suggest that you create a BlockingCollection (a queue) of work items, and a timer that calls the database every 30 seconds to populate it. Something like:
BlockingCollection<WorkItem> WorkItems = new BlockingCollection<WorkItem>();
And on initialization:
System.Threading.Timer WorkItemTimer = new Timer((s) =>
{
var items = WorkItemRepository.GetItemList(); //database call
foreach (var item in items)
{
WorkItems.Add(item);
}
}, null, 30000, 30000);
That will query the database for items every 30 seconds.
For scheduling the work items to be processed, you have a number of different solutions. The closest to what you have would be this:
WorkItem item;
while (WorkItems.TryTake(out item, Timeout.Infinite, _cancellation))
{
Task.Factory.StartNew((s) =>
{
var myItem = (WorkItem)s;
// process here
}, item);
}
That eliminates blocking in any of the threads, and lets the TPL decide how best to allocate the parallel tasks.
EDIT:
Actually, closer to what you have is:
foreach (var item in WorkItems.GetConsumingEnumerable(_cancellation))
{
// start task to process item
}
You might be able to use:
Parallel.Foreach(WorkItems.GetConsumingEnumerable(_cancellation).AsParallel ...
I don't know if that will work or how well. Might be worth a try . . .
END OF EDIT
In general, what I'm suggesting is that you treat this as a producer/consumer application, with the producer being the thread that queries the database periodically for new items. My example queries the database once every N (30 in this case) seconds, which will work well if, on average, you can empty your work queue every 30 seconds. That will give an average latency of less than a minute from the time an item is posted to the database until you have the results.
You can reduce the polling frequency (and thus the latency), but that will cause more database traffic.
You can get fancier with it, too. For example, if you poll the database after 30 seconds and you get a huge number of items, then it's likely that you'll be getting more soon, and you'll want to poll again in 15 seconds (or less). Conversely, if you poll the database after 30 seconds and get nothing, then you can probably wait longer before you poll again.
You can set up that kind of adaptive polling using a one-shot timer. That is, you specify -1 for the last parameter when you create the timer, which causes it to fire only once. Your timer callback figures out how long to wait before the next poll and calls Timer.Change
to initialize the timer with the new value.
You can use the .WithDegreeOfParallelism() extension method to force PLinq to run the tasks simultaneously. There's a good example in the Call Blocking or I/O Intensive section in th C# Threading Handbook
You may be falling foul of the Partitioner.
Because you are passing an IEnumerable, Parallel.ForEach will use a Chunk Partitioner which can try to grab a few elements at a time from the enumeration in a chunk. But your IEnumerable.MoveNext can sleep, which will upset things.
You could write your own Partitioner that returns one element at a time, but in any case, I think a producer/consumer approach such as Jim Mischel's suggestion will work better.
What are you trying to accomplish with the sleeping? From what I can tell, you're trying to avoid pounding database calls. I don't know of a better way to do that, but it seems like ideally your GetItemList
call would be blocking until data was available to process.
精彩评论