I'm building a scientific 开发者_运维技巧application in Python and considering using Amazon EC2 to run the process on.
My application is both memory and CPU hungry and would benefit from any resources given to it.
An Extra Large Instance
of EC2 gives about 15GB of memory, along with 8 compute units.
My question is, can a single Python script (when run on EC2) take advantage of all 8 compute units? Or must I run 8 independent processes in order to fully take advantage of the 8 compute units?
Note: in case it matters, I plan on using a Linux instance on EC2.
Python has a GIL that makes it complex to write multi-threaded applications that fully utilize more than one core. You can read more about it here How do threads work in Python, and what are common Python-threading specific pitfalls? or here http://www.dabeaz.com/python/UnderstandingGIL.pdf if you're really into the details. I tend to only use Python threads to enable background operation of various tasks (such as communication) rather than for optimal performance.
As Jeremy said, using the multiprocessing module is an alternative option, or you could simply write your script so it works on independent parts of your data, and then start however many copies you prefer.
The 8 "compute units" run across 4 physical processors, so a straightforward script would only be able to use 25% of the available power. However, the Python multiprocessing module allows you to write a single script using multiple processes, potentially taking advantage of all of the "compute units".
精彩评论