at work we recently migrated a set of services from framework 1.1 to framework 3.5. We just imported and recompiled old (1.1) projects.
Those services check every 7 seconds the DB, gater a number of task to do and for each task create a thread (using ThreadPool) that make a synchronous (in the thread scope) web service call (and some other thing). After the call is done the thread is completed.
We create a new thread only if the ThreadPool got at least 50% of the total threads available (we use ThreadPool.GetAvailableThreads to check that). If so, the thread is created, otherwise the service wait a second (Thread.Sleep(1000);) and then make a new check.
All this worked fine in 1.1 and still seem to work fine in 3.5, but only in test machine.
All the problem started once released to production machine: the check and the threads are correctly done, but web service call start with a huge delay (even 20 and more minutes) and are resolved slowly: while the 1.1 version complete 3 call per second (more or less), the 3.5 version resolve a call every 20 seconds (and, as said, each of them with 20 minute delay). We didn't see any connection problem nor timeouts.
What we see is that when we start the service a number of thread (about 30) is allocated in very small time and each of it immediatly start a WS call (i.e. the code execute the .() synchronous call). The problem is that those calls are resolved only one at time and several minutes after when they are made.
Now: why this happen only in production machine? Why in other test machine this won't happen?
Any suggestion?
Thanks in advance.
EDIT
We made some changes in the service. Now we allow at most a fixed number of thread (35) to be allocated in the threadpool. When this number is reached the service await for a thread to complete prior to create a new one. The process work fine, but we still notice that each web request is performed in a larger time span:
[11:48:40,365] [ 336] INFO MyClass.Call(): Time elapsed: 00:04:07.0172059
[11:49:14,771] [ 3052] INFO MyClass.Call(): Time elapsed: 00:04:41.4236761
[11:49:27,256] [ 3928] INFO MyClass.Call(): Time elapsed: 00:04:53.9081310
[11:49:51,303] [ 5444] INFO MyClass.Call(): Time elapsed: 00:05:17.9551599
[11:50:34,787] [ 5332] INFO MyClass.Call(): Time elapsed: 00:06:00.2054303
[11:50:55,366] [ 4672] INFO MyClass.Call(): Time elapsed: 00:06:19.7211802
[11:51:15,288] [ 1248] INFO MyClass.Call(): Time elapsed: 00:06:38.6588014
[11:51:34,975] [ 4168] INFO MyClass.Call(): Time elapsed: 00:06:57.3307959
[11:53:00,429] [ 5224] INFO MyClass.Call(): Time elapsed: 00:08:18.7531920
[11:53:18,367] [ 908] INFO MyClass.Call(): Time elapsed: 00:08:35.7064255
[11:53:44,398] [ 3616] INFO MyClass.Cal开发者_StackOverflow社区l(): Time elapsed: 00:09:00.7222106
[11:54:04,523] [ 6108] INFO MyClass.Call(): Time elapsed: 00:09:19.8942083
[11:54:29,523] [ 5536] INFO MyClass.Call(): Time elapsed: 00:09:43.8943619
[11:54:47,242] [ 5048] INFO MyClass.Call(): Time elapsed: 00:10:00.5819687
[11:55:05,992] [ 3756] INFO MyClass.Call(): Time elapsed: 00:10:18.3164572
[11:55:26,508] [ 4628] INFO MyClass.Call(): Time elapsed: 00:10:37.8322071
[11:55:52,493] [ 5812] INFO MyClass.Call(): Time elapsed: 00:11:02.8167420
[11:56:22,305] [ 5752] INFO MyClass.Call(): Time elapsed: 00:11:32.0356790
[11:56:31,680] [ 4688] INFO MyClass.Call(): Time elapsed: 00:11:41.0513617
[11:57:00,556] [ 5844] INFO MyClass.Call(): Time elapsed: 00:12:08.8952899
[11:57:10,759] [ 5760] INFO MyClass.Call(): Time elapsed: 00:12:18.0203483
[11:57:49,321] [ 5684] INFO MyClass.Call(): Time elapsed: 00:12:35.3017089
[11:57:59,056] [ 5920] INFO MyClass.Call(): Time elapsed: 00:12:26.9110302
Further inspections evidenced that the effective TCP call last in about 20 seconds, but is performed only several minutes after che code call.
For example, the following code:
slg.Info("Beginning connection.");
DateTime callStart = DateTime.Now;
odOutput = ws_Proxy.WSMethod(odInput);;
log.InfoFormat("Time elapsed: {0}", DateTime.Now - callStart);
produce this log
[11:47:59,396] [ 5468] INFO MyClass.Call(): Beginning connection.
[12:00:17,026] [ 5468] INFO MyClass.Call(): Time elapsed: 00:12:17.6297208
but the TCP connection is made only at 11:59:56,039 and completed at 12:00:17,404
11:59:56.0399905 TCP Connect SUCCESS Length: 0, mss: 1300, sackopt: 4, tsopt: 0, wsopt: 0, rcvwin: 64240, rcvwinscale: 0, sndwinscale: 0, connid: 2254643208, seqnum: 155953480 0
11:59:56.8962917 TCP Send... SUCCESS Length: 302, startime: 1012803, endtime: 1012811, connid: 2254643208, seqnum: 0 0
11:59:56.9794909 TCP Receive SUCCESS Length: 2, connid: 2254643208, seqnum: 0 0
11:59:57.7751084 TCP Send... SUCCESS Length: 625, startime: 1012806, endtime: 1012820, connid: 2254643208, seqnum: 0 0
11:59:57.7751656 TCP Receive SUCCESS Length: 2, connid: 2254643208, seqnum: 0 0
12:00:08.3723058 TCP Receive SUCCESS Length: 1300, connid: 2254643208, seqnum: 0 0
12:00:16.5336435 TCP Unknown SUCCESS Length: 0, connid: 2254643208, seqnum: 0 0
12:00:16.5337723 TCP Receive SUCCESS Length: 1265, connid: 2254643208, seqnum: 0 0
12:00:17.4046865 TCP Disconnect SUCCESS Length: 0, connid: 2158355531, seqnum: 4133859281 0
What happen between 11:47:59,396 and 11:59:56,039? Why the interval between the call and the actual connection grow so much? Why this behaviour happen only when a number of request are performed asynchronously? It's like the framework is unable to perform more than one connection at time...
a) The algorithm and limits of the ThreadPool have changed. Your approach of using AvailableThreads/2 might have worked once but on my Win7 laptop that would be 1023/2 and 500 Threads is way too much.
b) You are looking at Threads and the ThreadPool but from your description it could as well be a network or firewall issue.
At last we found the problem that seem to be caused by a proxy (sysadmins cannot tell who or why configured this proxy).
This proxy is activated only by process running as system (like the services). Process running as user (like the small app used to test) doesn't use this proxy, and this is why the small test app run normally while the services run with a number of issues.
We resolved the issue simply adding following lines to the services configuration:
<system.net>
<defaultProxy enabled="false" />
</system.net>
精彩评论