开发者

Parallel Foreach Memory Issue

开发者 https://www.devze.com 2023-03-05 05:33 出处:网络
I have a file collection (3000 files) in a FileInfoCollection. I want to process all the files by applying some logic which is independent (can be executed in parallel).

I have a file collection (3000 files) in a FileInfoCollection. I want to process all the files by applying some logic which is independent (can be executed in parallel).

 FileInfo[] fileInfoCollection = directory.GetFiles();
 Parallel.ForEach(fileInfoCollection, ProcessWorkerItem);

But after processing about 700 files I am getting an out of memory error. 开发者_StackOverflow社区I used Thread-pool before but it was giving same error. If I try to execute without threading (parallel processing) it works fine.

In "ProcessWorkerItem" I am running an algorithm based on the string data of the file. Additionally I use log4net for logging and there are lot of communications with the SQL server in this method.

Here are some info, Files size : 1-2 KB XML files. I read those files and the process is dependent on the content of the file. It is identifying some keywords in the string and generating another XML format. Keywords are in the SQL server database (nearly 2000 words).


Well, what does ProcessWorkerItem do? You may be able to change that to use less memory (e.g. stream the data instead of loading it all in at once) or you may want to explicitly limit the degree of parallelism using this overload and ParallelOptions.MaxDegreeOfParallelism. Basically you want to avoid trying to process all 3000 files at once :) IIRC, Parallel Extensions will "notice" if your tasks appear to be IO bound, and allow more than the normal number to execute at once - which isn't really what you want here, as you're memory bound as well.


If you're attempting operations on large files in parallel then it's feasible that you would run out of available memory.

Maybe consider trying out Rx extensions and using it's Throttle method to control/compose your processing?


I found the bug which raised the memory leak, I as using Unit Of Work pattern with entity framework. In unit of work I keep the context in a hash table with thread name as the hash key. When I use threading the hash table keeps growing and it cased the memory leak. So I added additional method to unit of work to remove the element from hash table after completing the task of a thread.

public static void DisposeUnitOfWork()
        {
            IUnitOfWork unitOfWork = GetUnitOfWork();

            if (unitOfWork != null)
            {
                unitOfWork.Dispose();
                hashTable.Remove(Thread.CurrentThread.Name);


            }
        }
0

精彩评论

暂无评论...
验证码 换一张
取 消