开发者

Sporadic windows service failure under .NET 4, followed by port blocked on restart attempt

开发者 https://www.devze.com 2023-01-10 04:20 出处:网络
About once a day, I am receiving the following error in our mission-critical trading service. Source: .NET Runtime, Type: Error, Application: Application.exe,

About once a day, I am receiving the following error in our mission-critical trading service.

Source: .NET Runtime, Type: Error, Application: Application.exe, Framework Version: v4.0.30319, Description: The process was terminated due to an internal error in the .NET Runtime at IP 000006447F281DBD (000006447F100000) with exit code 80131506.

After receiving this error and trying to restart the application, it appears that the sockets we were bound to have not been cleaned up from the previous (failed) execution because we receive a System.ServiceModel.AddressAlreadyInUseException when trying to Bind the socket during startup.

I have two que开发者_开发问答stions around this.

  1. We need to understand why the first error is occuring, do you have any information from the error codes, etc.
  2. We need a way to be able to Bind successfully after the error has occured. Any suggestions on how to cleanup the ports during the next startup.

The environment the application is running under is

  • Microsoft Windows Server 2003 R2
  • Standard x64 Edition
  • Service Pack 2
  • 2x 4Core Intel CPU X5365 @ 3.00GHz
  • 16.0 GB of RAM.


This is the ExecutionEngineException from the earlier .NET days. You cannot catch it in .NET 4.0, AppDomain.UnhandledException won't run.

The generic diagnostic for this exception is that the integrity of the garbage collected heap was compromised. A typical trigger is unmanaged code writing past the end of a buffer. Or it can be environmental, virus scanners have a knack to cause this problem. Especially Symantec security products. Which is somewhat likely in your case, given that the ports aren't being closed automatically when your service terminates. It is also technically possible for a bug in the CLR to cause this.

I'd thus recommend:

  • Inspect your source code base and thoroughly review any unmanaged code that's used.
  • Contact vendors of 3rd party components and ask about known heap corruption problems.
  • Review the configuration of the machine on which this code runs. Disable add-ons where possible, temporarily disable anything that isn't strictly necessary to run your service
  • Retarget your project to the .NET 3.5 SP1 framework.


To get more information on the error add a global, last chance, exception handler. This will pick up any exception that is not otherwise handled. It should log (at minimum( the exception type, message and stack trace (ideally also a memory minidump and list of loaded assemblies with versions and code base).

This will give you a far better chance of fixing (or at least mitigating) the initial problem.


The issue with sockets is because sockets wait for a while to ensure all data has been flushed before shutting down completely (watch TCP View for a while, and you'll see this as system inherits sockets after applications have finished with them).


After years of wrestling with this issue in a number of applications, it appears that Microsoft has finally accepted it as a bug in .net 4 clr that causes this occur. http://support.microsoft.com/kb/2640103.

I had for years been "fixing" it by forcing the garbage collector to run in server mode(gcServer enabled="true" in app.config) which in essence forces all threads in the application to pause during the collection removing the possibility of other threads accessing the memory being manipulated by the GC.


Adding to what @Richard pointed out, your exception is an unhandled exception and you can use register for the following event and find out why the exception occurred. You can also use this to dispose any unmanaged objects.

AppDomain.CurrentDomain.UnhandledException +=new UnhandledExceptionEventHandler( CurrentDomain_UnhandledException );

static void CurrentDomain_UnhandledException( object sender, UnhandledExceptionEventArgs e )
    {
        // Log the reason.
        // Also cleanup open sockets if possible. 
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消