I've written a script that walks through the files in some directory, starts a thread and does something to the files. Because these are lengthy, expensive operations, I restricted the number of threads to one less than the number of cpus found on the given machine I'm running it on. It then constantly checks for the number of threads that are active, and if there's spare capacity, starts another thread. For the function that returns the number of cpus on a machine, I used this.
ncpus = detectCPUs()
for (dirpath, dirnames, filenames) in os.walk(path_to_root):
for filename in filenames:
while True:
if threading.activeCount() < ncpus - 1:
开发者_JAVA百科 MyThread(dirpath, filename).start()
break
else:
time.sleep(100)
I can't help escape the feeling that there are functions in the threading
library or elsewhere in python that would automatically do this for me without me having to keep tabs on the number of threads and cpus. Would anyone know of any? Or pointing out how seasoned veterans would do it?
Some restrictions. The shared machines I'm using only have python 2.5 installed and I don't have root privileges to install stuff. So multiprocessing
or nice libraries that require python 2.6 or higher are out of the question.
Perhaps a thread pool implementation is what you want here?
http://code.activestate.com/recipes/577187/
It would look something like that:
pool = ThreadPool(num_threads)
for obj in objects:
pool.add_task(obj.do_stuff, [arg1, arg2])
pool.wait_completion()
Even if you can't upgrade Python, you can still use multiprocessing.
multiprocessing
is a back port of the Python 2.6/3.0multiprocessing
package. […] This standalone variant is intended to be compatible with Python 2.4 and 2.5, and will draw its fixes/improvements from python-trunk.
Just install it as a local library.
There's a few other "worker/thread pool" libraries out there, but you really want to use multiprocessing, or subprocess at least. Python's GIL means that "threads" often block each other on a single CPU, lowering throughput and being slower than if the process were single-threaded, especially when I/O is involved.
If you are using canonical python then there is a limit on how helpful threads are. Canonical python uses a global interpreter lock (GIL) which only allows a single python thread to execute at a time.
However if your file operation block for long periods of time or, you are using a python libary written in C which releases the GIL, then threads will help you.
I would strongly recommend looking at multiprocessing as that will let you sidestep te GIL.
精彩评论