开发者

C# huge size 2-dim arrays

开发者 https://www.devze.com 2022-12-25 18:37 出处:网络
I need to declare square matrices in C# WinForms with more than 20000 items in a row. I read about 2GB .Net object size limit in 32bit and also the same case in 64bit OS.

I need to declare square matrices in C# WinForms with more than 20000 items in a row. I read about 2GB .Net object size limit in 32bit and also the same case in 64bit OS. So as I understood the single answer - is using unsafe code or separate library built withing C++ compiler.

The problem for me is worth because ushort[20000,20000] is smaller then 2GB but actually I cannot allocate even 700MB of memory. My limit is 650MB and I don't understand why - I have 32bit WinXP with 3GB of memory. I tried to use Marshal.AllocHGlobal(700<<20) but it throws OutOfMemoryException, GC.GetTotalMemory returns 4.5MB before trying to allocate memory.

I found only that many people say use unsafe code but I cannot find example of how to declare 2-dim array in heap (an开发者_StackOverflowy stack can't keep so huge amount of data) and how to work with it using pointers. Is it pure C++ code inside of unsafe{} brackets?

PS. Please don't ask WHY I need so huge arrays... but if you want - I need to analyze texts (for example books) and found lot of indexes. So answer is - matrices of relations between words

Edit: Could somebody please provide a small example of working with matrices using pointers in unsafe code. I know that under 32bit it is impossible to allocate more space but I spent much time in googling such example and found NOTHING


Why demand a huge 2-D array? You can simulate this with, for example, a jagged array - ushort[][] - almost as fast, and you won't hit the same single-object limit. You'll still need buckets-o-RAM of course, so x64 is implied...

        ushort[][] arr = new ushort[size][];
        for(int i = 0 ; i < size ; i++) {
            arr[i] = new ushort[size];
        }

Besides which - you might want to look at sparse-arrays, eta-vectors, and all that jazz.


The reason why you can't get near even the 2Gb allocation in 32 bit Windows is that arrays in the CLR are laid out in contiguous memory. In 32 bit Windows you have such a restricted address space that you'll find nothing like a 2Gb hole in the virtual address space of the process. Your experiments suggest that the largest region of available address space is 650Mb. Moving to 64 bit Windows should at least allow you to use a full 2Gb allocation.

Note that the virtual address space limitation on 32 bit Windows has nothing to do with the amount of physical memory you have in your computer, in your case 3Gb. Instead the limitation is caused by the number of bits the CPU uses to address memory addresses. 32 bit Windows uses, unsurprisingly, 32 bits to access each memory address which gives a total addressable memory space of 4Gbytes. By default Windows keeps 2Gb for itself and gives 2Gb to the currently running process, so you can see why the CLR will find nothing like a 2Gb allocation. With some trickery you can change the OS/user allocation so that Windows only keeps 1Gb for itself and gives the running process 3Gb which might help. However with 64 bit windows the addressable memory assigned to each process jumps up to 8 Terabytes so here the CLR will almost certainly be able to use full 2Gb allocations for arrays.


I'm so happy! :) Recently I played around subject problem - tried to resolve it using database but only found that this way is far to be perfect. Matrix [20000,20000] was implemented as single table. Even with properly set up indexes time required only to create more than 400 millions records is about 1 hour on my PC. It is not critical for me. Then I ran algorithm to work with that matrix (require twice to join the same table!) and after it worked more than half an hour it made no even single step. After that I understood that only way is to find a way to work with such matrix in memory only and back to C# again.

I created pilot application to test memory allocation process and to determine where exactly allocation process stops using different structures.

As was said in my first post it is possible to allocate using 2-dim arrays only about 650MB under 32bit WinXP. Results after using Win7 and 64bit compilation also were sad - less than 700MB.

I used JAGGED ARRAYS [][] instead of single 2-dim array [,] and results you can see below:

Compiled in Release mode as 32bit app - WinXP 32bit 3GB phys. mem. - 1.45GB Compiled in Release mode as 64bit app - Win7 64bit 2GB under VM - 7.5GB

--Sources of application which I used for testing are attached to this post. I cannot find here how to attach source files so just describe design part and put here manual code. Create WinForms application. Put on form such contols with default names: 1 button, 1 numericUpDown and 1 listbox In .cs file add next code and run.

private void button1_Click(object sender, EventArgs e)
        {
            //Log(string.Format("Memory used before collection: {0}", GC.GetTotalMemory(false)));
            GC.Collect();
            //Log(string.Format("Memory used after collection: {0}", GC.GetTotalMemory(true)));
            listBox1.Items.Clear();
            if (string.IsNullOrEmpty(numericUpDown1.Text )) {
                Log("Enter integer value");
            }else{
                int val = (int) numericUpDown1.Value;
                Log(TryAllocate(val));
            }
        }

        /// <summary>
        /// Memory Test method
        /// </summary>
        /// <param name="rowLen">in MB</param>
        private IEnumerable<string> TryAllocate(int rowLen) {
            var r = new List<string>();
            r.Add ( string.Format("Allocating using jagged array with overall size (MB) = {0}", ((long)rowLen*rowLen*Marshal.SizeOf(typeof(int))) >> 20) );
            try {
                var ar = new int[rowLen][];
                for (int i = 0; i < ar.Length; i++) {
                    try {
                        ar[i] = new int[rowLen];
                    }
                    catch (Exception e) {
                        r.Add ( string.Format("Unable to allocate memory on step {0}. Allocated {1} MB", i
                            , ((long)rowLen*i*Marshal.SizeOf(typeof(int))) >> 20 ));
                        break;
                    }
                }
                r.Add("Memory was successfully allocated");
            }
            catch (Exception e) {
                r.Add(e.Message + e.StackTrace);
            }
            return r;
        }

        #region Logging

        private void Log(string s) {
            listBox1.Items.Add(s);
        }

        private void Log(IEnumerable<string> s)
        {
            if (s != null) {
                foreach (var ss in s) {
                    listBox1.Items.Add ( ss );
                }
            }
        }

        #endregion

The problem is solved for me. Guys, thank you in advance!


If sparse array does not apply, it's probably better to just do it in C/C++ with platform APIs related to memory mapped file: http://en.wikipedia.org/wiki/Memory-mapped_file


For the OutOfMemoryException read this thread (especially nobugz and Brian Rasmussen's answer):
Microsoft Visual C# 2008 Reducing number of loaded dlls


If you explained what you are trying to do it would be easier to help. Maybe there are better ways than allocating such a huge amount of memory at once.

Re-design is also choice number one in this great blog post:

BigArray, getting around the 2GB array size limit

The options suggested in this article are:

  • Re-design
  • Native memory for array containing simple types, sample code available here:

    • Unsafe Code Tutorial
    • Unsafe Code and Pointers (C# Programming Guide)
    • How to: Use Pointers to Copy an Array of Bytes (C# Programming Guide)
  • Writing a BigArray class which segments the large data structure into smaller segments of manageable size, sample code in the above blog post

0

精彩评论

暂无评论...
验证码 换一张
取 消