Background
I am trying to work out the best structure for an Azure application. Each of my worker roles will spin up multiple long-runni开发者_Go百科ng jobs. Over time I can transfer jobs from one instance to another by switching them to a readonly mode on the source instance, spinning them up on the target instance, and then spinning the original down on the source instance.
If I have too many jobs then I can tell Azure to spin up extra role instance, and use them for new jobs. Conversely if my load drops (e.g. during the night) then I can consolidate outstanding jobs to a few machines and tell Azure to give me fewer instances.
The trouble is that (as I understand it) Azure provides no mechanism to allow me to decide which instance to stop. Thus I cannot know which servers to consolidate onto, and some of my jobs will die when their instance stops, causing delays for users while I restart those jobs on surviving instances.
Idea 1: I decide which instance to stop, and return from its Run(). I then tell Azure to reduce my instance count by one, and hope it concludes that the broken instance is a good candidate. Has anyone tried anything like this?
Idea 2: I predefine a whole bunch of different worker roles, with identical contents. I can individually stop and start them by switching their instance count from zero to one, and back again. I think this idea would work, but I don't like it because it seems to go against the natural Azure way of doing things, and because it involves me in a lot of extra bookkeeping to manage the extra worker roles.
Idea 3: Live with it.
Any better ideas?
In response to your ideas
Idea 1: I haven't tried doing exactly what you're describing, but in my experience your first instance has a name that ends with _0, the next _1 and I'm sure you can guess the rest. When you decrease the instance count it drops off the instance with the highest number suffix. I would be surprised if it took into account the state of any particular instance.
Idea 2: As I think you hint at, this will create management problems. You can only have 5 different workers per hosted service, so you'll need a service for each group of 5 roles that you want to be able to scale to. Also when you deploy updates you'll have to upload X times more services where X is the maximum number of instances you currently support.
Idea 3: Technically the easiest. Pending some clarification, this is probably what I'd be doing for now. To reduce the downsides of this option it may pay to investigate ways of loading the data faster. There is usually a Goldilocks level (not too much, not too little) of parallelism that helps with this.
You're right - you cannot choose which instance to stop. In general, you'd run the same jobs on each worker role instance, where each instance watches the same queue (or maybe multiple threads or jobs watching multiple queues).
If you really need to run a job on one instance (such as a scheduler), consider using blob leases as the way to constrain this. Create a blob as a mutex. Then, as each instance spins up, the scheduler job attempts to obtain a write lease on that blob. If it succeeds, it runs. If it fails, it simply sleeps (maybe for a minute) and tries again. At some point in the future, as you scale down in instance count, let's say the instance running the scheduler is killed. A minute later (or whatever time span you choose), another instance tries to acquire the lease, succeeds, and now runs the scheduler code.
精彩评论