looking for some eyeballs to verifiy that the following chunk of psuedo python makes sense. i'm looking to spawn a number of threads to implement some inproc functions as fast as possible. the idea is to spawn the threads in the master loop, so the app will run the threads simultaneously in a parallel/concurrent manner
chunk of code
-get the filenames from a dir
-write each filename ot a queue
-spawn a thread for each filename, where each thread
waits/reads value/data from the queue
-the threadParse function then handles the actual processing
based on the file that's included via the "execfile" function...
# System modules
from Queue import Queue
from threading import Thread
import time
# Local modules
#import feedparser
# Set up some global variables
appqueue = Queue()
# more than the app will need
# this matches the number of files that will ever be in the
# urldir
#
num_fetch_threads = 200
def threadParse(q)
#decompose the packet to get the various elements
line = q.get()
college,level,packet=decompose (line)
#build name of included file
fname=college+"_"+level+"_Parse.py"
execfile(fname)
q.task_done()
#setup the master loop
while True
time.sleep(2)
# get the files from the dir
# setup threads
filelist="ls /urldir"
if filelist
foreach file_ in filelist:
worker = Thread(target=threadParse, args=(appqueue,))
worker.start()
# again, get the files from the dir
#setup the queue
filelist="ls /urldir"
foreach file_ in filelist:
#stuff the filename in the queue
appqueue.put(file_)
# Now wait for the queue开发者_运维知识库 to be empty, indicating that we have
# processed all of the downloads.
#don't care about this part
#print '*** Main thread waiting'
#appqueue.join()
#print '*** Done'
Thoughts/comments/pointers are appreciated...
thanks
If I understand this right: You spawn lots of threads to get things done faster.
This only works if the main part of the job done in each thread is done without holding the GIL. So if there is a lot of waiting for data from network, disk or something like that, it might be a good idea. If each of the tasks are using a lot of CPU, this will run pretty much like on a single core 1-CPU machine and you might as well do them in sequence.
I should add that what I wrote is true for CPython, but not necessarily for Jython/IronPython. Also, I should add that if you need to utilize more CPUs/cores, there's the multiprocessing module that might help.
精彩评论