The needs of my company are quite simple : We have a multi-threaded .Net computing program that reads many Gb of binary files, processes massive calculations, and stores the results into an SQL Server Database. We would like to do this on cloud to perform this recurrent task in the shortest time possible.
So we are right into the cloud/grid/cluster computing thing.I thoug开发者_开发技巧ht there would be tons of resources on the subject and plenty of available alternatives. I was simply stunned to figure out how wrong I was. While mounting/running EC2 instances was a breeze, finding a relatively simple and straightforward way to parallelize and aggregate the processing power of these EC2 instances was not easy. Amazon customer service keeps dodging around and I was simply unable to get a concrete answer from them.
I found utilify which sounds promising. It is developed by the alchemi people. However, the documentation link is broken and I had no answer to my emails when I contacted support so this was not very reassuring.
We have chosen Amazon over Azure as AMI's are straight seamless VM's (no need to "bundle" the app or other) and because EBS is a more convenient storage as it is a "real" filesystem. On the other hand, Azure seems HPC ready for windows, whereas AWS offers that for Linux powered AMI's only.
Any help and propositions are more than welcome
EDIT :
The .Net application is multi-threaded and consist of hundreds of parallel workers doing exactly the same task asynchronously.Amazon EC2 is inherently a Infrastructure as a service system (IaaS), which means that EC2 will give you the hardware and OS but will not solve your grid computing problem for you. This is contrast to Windows Azure, which is a Platform as a Service (PaaS) system that requires using a different architecture where your application is broken out into different roles (web role, worker role, etc) that can easily be scaled out into a grid. See this question for more details about IaaS vs PaaS.
The difference for deployment on Azure vs EC2 is precisely because Azure requires you to think at a larger scale then EC2. If you want to scale on EC2 you have to do it on your own or use their Elastic Bean Stalk, which currently only supports Java on Apache Tomcat.
As for how to design the system, my recommendation would be to find a way to break the problem down into chunks that can be processed on individual machines, and load a message into a queue that describes how to perform the work. You then would have EC2 instances or Azure Roles pull work out of the Queue, perform the required calculations, and then either store the results directly in destination or send the result to an output queue that then aggregates the results. That is the most simple method of performing Grid computing without completely re-designing for something like MapReduce. You do still need to worry about what happens if a VM dies before committing results, but this can be managed by not deleting the Queue entry until it's results have been commited.
If you can go back to Azure rather than EC2, then:
- David Pallman produced an example Grid project for Azure - http://azuregrid.codeplex.com/
- the Lokad.Cloud project has some interesting framework code, including a simple Map-Reduce example - http://lokadcloud.codeplex.com/
Sorry - don't have any similar references for EC2 - although you may be able to get some inspiration from Microsoft's Dryad projects (I think these are currently only available under "educational" non-commercial license)
You should be looking at Windows HPC
Microsoft are working hard to deliver HPC nodes on windows azure which is exactly what you're looking for. Here's a white paper on it:
http://download.microsoft.com/download/4/5/C/45C520F4-424C-41CF-A115-E76A38ADB280/Windows_HPC_Server_and_Windows_Azure.docx
from here: http://www.microsoft.com/hpc/en/us/default.aspx
http://www.networkworld.com/news/2010/111610-microsoft-hpc-server.html
精彩评论