Tips For Dealing With Huge RAM Working Sets_问答_开发者

I am working on a .Net 3.5 application designed specifically for a high-powered PC that does lots of data manipulation and computation. I recently encountered a need for a 4000 x 5000 two-dimensional array of objects, which is very large for a 32-bit PC and will give me an OutOfMemoryException. The only way to avoid using an array like this is by going down a very complex, time-consuming road filled with pain and suffering.

Are there any tips or tricks that the pros use to deal with large working sets of RAM? Do you know of any libraries that would be helpful (specifically for .Net)? Is there a way to force Windows to allocate more RAM for my process?

EDIT: The array I am using will contain mostly null references, and I am using the array to keep track of adjacent objects. Seeing how most of t开发者_运维技巧hem are null references, I would also assume there is a more efficient approach to keeping track of adjacent objects, finding a neighbor for any given object, etc.

Judging by your comments, I think I can now answer your question. If most of the references are null then you can hash the keys into a table that in turn point to your elements. There is constant time O(1) loopup time in a hash map and you wont have to worry about key collisions because each [x,y] pair is unique. You also wont have to worry about memory collisions since most of the references are null.

If the majority of your elements are null, then perhaps you don't really need to create an array at all.

Jon suggests one approach that'll work - implementing a sparse array using linked lists. Here's another:

public struct CellLocation
{
   int Row;
   int Column;
}

public class Element
{
   public Element(int row, int column)
   {
      Location = new CellLocation {Row = row, Column=column};
   }

   public readonly Location { get; private set; }

   // your class's other properties and methods go here
}

Now you can store Element objects in a Dictionary<CellLocation, Element>. In fact, I'd put that dictionary into a class of its own, so that it can implement methods like:

public IEnumerable<Element> AdjacentElements(Element elm)
{
   for (int row = -1; row <= 1; row++)
   {
      for (int column = -1; column <= 1; column++)
      {
         // elm isn't adjacent to itself
         if (row == 0 && column == 0)
         {
            continue;
         }
         CellLocation key = new CellLocation { 
            Row=elm.Location.Row + row, 
            Column=elm.Location.Column + column 
         };
         if (!Cells.ContainsKey(key))
         {
            continue;
         }
         yield return Cells[key];
      }
   }
}

There are operations for which this can be faster than a sparse array. To find the element at a single row and column, a sparse array still has to do a linear search to find the row, and then another linear search to find the column in that row, whereas this method can find an element with one lookup into a hash table.

There are also circumstances in which it will be substantially slower. To find all the elements in a row requires as many hash-table lookups as there are cells in the row, while doing it with a sparse array just entails traversing a linked list.

Well, one thought is to scrap the two-dimensional array for a database instead. Something like SQLite has a small footprint and can be easily deployed with the application. There's even a C# wrapper for it.

SQLite will read this data from a single file. And so the reads and writes from the disk could take a performance hit. Although how much of a performance hit may depend on the nature of the application. Lookups via an index should be fast, for example. But massive calculations across the entire database would definitely be slower. So... I don't know but maybe it's worth considering.

You can efficiently store a grid-like structure where the majority of elements are null in a sparse array. They can be implemented in different ways, but commonly use modified linked lists for the rows and columns. There is a good introduction to the subject here.

Is the array fixed? i.e. the values in the array does not change...it might be worth it to dump the array contents to disk and use memory mapping technique instead, and you can then load a portion of the dumped array into a memory map for reading...otherwise it will not do if the data and elements in the array changes...

just my 2cents...

Hope this helps, Best regards, Tom.

There are 2 'easy' directions, at the OS or Process level.

Add the /3GB switch to your boot.ini, and modify your app to use /LARGEADDRESSAWARE. You immediately gain an additional 1G of virtual address space, but not without a tradeoff. Good chance it's the right choice for you.
Often the problem is not lack of memory, but rather its fragmentation - seems relevant to your context too (huge consecutive arrays). A while back I had put online some techniques that helped me combat fragmentation for native code - should be at least partially applicable to managed.

It looks like what you are actually doing is an adjacency matrix. If this is the case, and the underlying graph is sparse, then it would probably be better to switch to an adjacency list. http://en.wikipedia.org/wiki/Adjacency_list