I already posted a question about this (Abandoned instances that will not continue execution (zombie instances)), but still haven't got an answer.
A difference I have noticed from the last question is that it can happen also when the configuration of the service Action on unhandled exception is set AbandonAndSuspend.
So the scenario is a long term Workflow service hosted in IIS using AppFabric persistence store. This service performs some actions and then polls the result from a database. This polling is done every 30 minutes. For some reason the WF gets stuck and doesn't do anything else. Checking the Insta开发者_运维技巧ncesTable I can see a past pending timer and an old LastUpdateTime.
The only workaround I found is to suspend and then resume the instances, which is obviously a painful process (there are around 5000 instances in this situation).
Thanks in advance
The problem, as suspected, was related to the maximum number of concurrent instances. Due to some technical problems in a WCF that might be unavailable, there were a number of instances continuously running (retrying a call to that WCF) that were coping that number of maximum concurrent instances. Due to that, there were very few instances activated on every detection period. Thanks Maurice for your help
精彩评论