I have a somewhat large server process written in .net-3.5, that is, running in a VMWare vCenter Server that keeps crashing without any errors being reported. The process is created by a Windows Service on 32 bit Windows Server 2003, and is intended to be a long running process (multiple days). It is a collaboration process, that accepts connections via Tcp sockets from multiple clients running on other Windows XP machines, and allows them to share data. In addition, the process also self-hosts about 8 WCF services that expose a mixture Tcp & Http endpoints. The process generally consumes about 500 Mb of memory and between 30-50% CPU at all times. There is also an instance of SQL Server 2005 on the same VM that is hosting 6 databases, and consumes about 1-1.2 Gb of memory. The entire system has been allocated 8 Gb of ram, and is consuming as much as 7 Gb during normal operation. I assume PAE is enabled to allow the system to address 8 Gb of ram, but have not confirmed this.
The problem is that, at seemingly random times, the process will suddenly crash with开发者_开发知识库 no errors being reported, including in the event log. I've tried attaching debuggers to the process, and they have not caught the crash either. I first tried WinDbg on the release build with symbols loaded, then I replaced all of the release dlls/exes with debug builds and loaded their symbols. The crashes still occurred, and the debugger did not catch them. I next installed Visual Studio on the system with the .Net Reflector add-in, and attached that. It also did not catch the crash.
Before you lecture me on why we're running so many things on a single VM, know that I did not design the system, nor did I implement it this way. Our customer dictated it for specific reasons, and I've been asked to come in and make it work. I'm only interested in criticisms of the environment if you can site specific evidence that would help explain the sudden crashes. Our customer may be willing to alter the environment if we can show such evidence. Any additional debugging techniques that will allow me to capture more information about the crash would be greatly appreciated as well.
http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx
A "crash" without output suggests a call to _exit()
(or even exit()
). I've seen a few corners of the Visual Studio runtime library do that, though they usually get a cryptic message out to stderr
. Is stderr
captured?
The suspicion of running out of memory also seems likely. If .net has a heapspace()
-like function to describe how much memory is being used by the heap, log that periodically, perhaps along with total memory used (code + stack + data). I'm not familiar with .net, but there must be functions to get those values.
It turns out that one of the service plugins was seeking out and referencing a Java library. When the user logged out, the plugin crashed the service due to the JVM being terminated. We were able to get everything working again by following the suggestions in this post (starting JVM with the '-Xrs' parameter): http://www.velocityreviews.com/forums/t128371-java-app-dies-on-logoff.html
精彩评论