开发者

Slowdown on creating objects with many threads

开发者 https://www.devze.com 2023-02-09 19:12 出处:网络
I\'m doing a project that spawn some hundreds of threads. All these threads are in a \"sleeping\" condition (they are locked on a Monitor object). I have noticed that if I increase the number of \"sle

I'm doing a project that spawn some hundreds of threads. All these threads are in a "sleeping" condition (they are locked on a Monitor object). I have noticed that if I increase the number of "sleeping" threads the program slow down very much. The "funny" thing is that looking at the Task Manager it seems that the greater the number of threads, the more free is the processor. I have narrowed the problem to object creation.

Can someone explain it to me?

I have produced a small sample to test it. It's a console program. It creates a thread for each processor and measure it's speed with a simple test (a "new Object()" ). No, the "new Object()" isn't jitted away (try if you don't trust me). The main thread show the speed of each thread. Pressing CTRL-C, the program spawns 50 "sleeping" threads. The slow down begins with just 50 threads. With around 250 it's very visible on the Task Manager that the CPU isn't 100% used (on mine it's 82%).

I have tried three methods of locking the "sleeping" thread: Thread.CurrentThread.Suspend() (bad, bad, I know :-) ), a lock on an already locked object and a Thread.Sleep(Timeout.Infinite). It's the same. If I comment the row with the new Object(), and I replace it with a Math.Sqrt (or with nothing) the problem isn't present. The speed doesn't change with the number of threads. Can someone else check it? Does anyone knows where is the bottle neck?

Ah... you should test it in Release Mode WITHOUT launching it from the Visual Studio. I'm using XP sp3 on a dual processor (no HT). I have tested it with the .NET 3.5 and 4.0 (to test the different framework runtimes)

namespace TestSpeed
{
    using System;
    using System.Collections.Generic;
    using System.Threading;

    class Program
    {
        private const long ticksInSec = 10000000;
        private const long ticksInMs = ticksInSec / 1000;
        private const int threadsTime = 50;
        private const int stackSizeBytes = 256 * 1024;
        private const int waitTimeMs = 1000;

        private static List<int> collects = new List<int>();
        private static int[] objsCreated;

        static void Main(string[] args)
        {
            objsCreated = new int[Environment.ProcessorCount];
            Monitor.开发者_开发知识库Enter(objsCreated);

            for (int i = 0; i < objsCreated.Length; i++)
            {
                new Thread(Worker).Start(i);
            }

            int[] oldCount = new int[objsCreated.Length];

            DateTime last = DateTime.UtcNow;

            Console.Clear();

            int numThreads = 0;
            Console.WriteLine("Press Ctrl-C to generate {0} sleeping threads, Ctrl-Break to end.", threadsTime);

            Console.CancelKeyPress += (sender, e) =>
            {
                if (e.SpecialKey != ConsoleSpecialKey.ControlC)
                {
                    return;
                }

                for (int i = 0; i < threadsTime; i++)
                {
                    new Thread(() =>
                    {
                        /* The same for all the three "ways" to lock forever a thread */
                        //Thread.CurrentThread.Suspend();
                        //Thread.Sleep(Timeout.Infinite);
                        lock (objsCreated) { }
                    }, stackSizeBytes).Start();

                    Interlocked.Increment(ref numThreads);
                }

                e.Cancel = true;
            };

            while (true)
            {
                Thread.Sleep(waitTimeMs);

                Console.SetCursorPosition(0, 1);

                DateTime now = DateTime.UtcNow;

                long ticks = (now - last).Ticks;

                Console.WriteLine("Slept for {0}ms", ticks / ticksInMs);

                Thread.MemoryBarrier();

                for (int i = 0; i < objsCreated.Length; i++)
                {
                    int count = objsCreated[i];
                    Console.WriteLine("{0} [{1} Threads]: {2}/sec    ", i, numThreads, ((long)(count - oldCount[i])) * ticksInSec / ticks);
                    oldCount[i] = count;
                }

                Console.WriteLine();

                CheckCollects();

                last = now;
            }
        }

        private static void Worker(object obj)
        {
            int ix = (int)obj;

            while (true)
            {
                /* First and second are slowed by threads, third, fourth, fifth and "nothing" aren't*/

                new Object();
                //if (new Object().Equals(null)) return;
                //Math.Sqrt(objsCreated[ix]);
                //if (Math.Sqrt(objsCreated[ix]) < 0) return;
                //Interlocked.Add(ref objsCreated[ix], 0);

                Interlocked.Increment(ref objsCreated[ix]);
            }
        }

        private static void CheckCollects()
        {
            int newMax = GC.MaxGeneration;

            while (newMax > collects.Count)
            {
                collects.Add(0);
            }

            for (int i = 0; i < collects.Count; i++)
            {
                int newCol = GC.CollectionCount(i);

                if (newCol != collects[i])
                {
                    collects[i] = newCol;
                    Console.WriteLine("Collect gen {0}: {1}", i, newCol);
                }
            }
        }
    }
}


Start Taskmgr.exe, Processes tab. View + Select columns, tick "Page Fault Delta". You'll see the impact of allocating hundreds of megabytes, just to store the stacks of all these threads you created. Every time that number blips for your process, your program blocks waiting for the operating system paging in data from the disk into RAM.

TANSTAAFL, There ain't no such thing as a free lunch.


My guess is that the problem is that garbage collection requires a certain amount of cooperation between threads - something either needs to check that they're all suspended, or ask them to suspend themselves and wait for it to happen, etc. (And even if they are suspended, it has to tell them not to wake up!)

This describes a "stop the world" garbage collector, of course. I believe there are at least two or three different GC implementations which differ in the details around parallelism... but I suspect that all of them are going to have some work to do in terms of getting threads to cooperate.


What you are seeing here is the GC in action. When you attach a debugger to your process you will see that many exceptions of the form

Unknown exception - code e0434f4e (first chance)

are thrown. This are exceptions caused by the GC to resume a suspended thread. As you know it is strongly discouraged to call Suspend/ResumeThread inside your process. This is even more true in managed world. The only authority which can do this safely is the GC. When you set a breakpoint at SuspendThread you will see

0118f010 5f3674da 00000000 00000000 83e36f53 KERNEL32!SuspendThread
0118f064 5f28c51d 00000000 83e36e63 00000000 mscorwks!Thread::SysSuspendForGC+0x2b0 (FPO: [Non-Fpo])
0118f154 5f28a83d 00000001 00000000 00000000 mscorwks!WKS::GCHeap::SuspendEE+0x194 (FPO: [Non-Fpo])
0118f17c 5f28c78c 00000000 00000000 0000000c mscorwks!WKS::GCHeap::GarbageCollectGeneration+0x136 (FPO: [Non-Fpo])
0118f208 5f28a0d3 002a43b0 0000000c 00000000 mscorwks!WKS::gc_heap::try_allocate_more_space+0x15a (FPO: [Non-Fpo])
0118f21c 5f28a16e 002a43b0 0000000c 00000000 mscorwks!WKS::gc_heap::allocate_more_space+0x11 (FPO: [Non-Fpo])
0118f23c 5f202341 002a43b0 0000000c 00000000 mscorwks!WKS::GCHeap::Alloc+0x3b (FPO: [Non-Fpo])
0118f258 5f209721 0000000c 00000000 00000000 mscorwks!Alloc+0x60 (FPO: [Non-Fpo])
0118f298 5f2097e6 5e2d078c 83e36c0b 00000000 mscorwks!FastAllocateObject+0x38 (FPO: [Non-Fpo])

that the GC does try to suspend all of your threads before he can do a full collection. On my machine (32 bit, Windows 7, .NET 3.5 SP1) the slowdown is not so dramatic. I do see a linear dependency between the thread count and the CPU (non) usage. It seems you are seeing increased costs for each GC because the GC has to suspend more threads before it can do a full collect. Interestingly the time is spent mainly in usermode so the kernel is not the limitting factor.

I do net see a way how you could get around that except using less threads or using unmanaged code. It could be that if you host the CLR by yourself and use Fibers instead of physical threads that the GC will scale much better. Unfortunately this feature was cut out during the relase cycle of .NET 2.0. Since it is now 6 years later there is little hope that it will be added ever again.

Besides from your thread count the GC is also limitted by the complexity of your object graph. Have a look at this "Do You Know The Costs Of Garbage?".

0

精彩评论

暂无评论...
验证码 换一张
取 消