I have a code-base that I'm looking to split up and add to by using threading, however I'm relatively new on how to handle it. Please before reading further respect my wish of NOT just re-writing this code and tossing it back at me with the problem solved. I would much rather work the problem out by someone pointing me in the right direction, than someone solving it FOR me; I don't learn well that way.
The fully functioning code-base is here -- It requires the mechanize and beautifulsoup libraries which can be installed via easy_install.
I've separated out all of my functions, and tried to keep the code as clean as possible (I'm sure there are some optimizations in there that I'll get reamed for, but the main problem is how to thread this.
My ultimate goal is to pack this into a thread, and then share cookies between other initialized browser objects in order to do other things while my original code is running 'backgrounded'.
I've tried thus:
class Recon(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
#Packed the stuff above my original while loop in here, minus functions.
def run(self):
#Packed my code past the while loop in here.
somevar = Recon()
somevar.start()
Problem I'm having is that, once I run the program it will run the things in init, but afterwards it just sits there and freezes on 开发者_Python百科me. No traceback, no errors, just doesn't do anything, doesn't even return my command prompt back to my control.
Could I just get some tips, or a general flow of how to convert this? I got overwhelmed and deleted the code I was trying with so I don't have that example, but do I need to be prepending 'self.' to all of my variables? Do I need to just define my vars as global?
Here is a reproduction of what I'm having trouble with after having tried to convert the script to use threading.
As long as you have a single thread (as in the above snippet, where you instantiate Recon
just once), it shouldn't matter much what you do where; but of course I imagine the reason you're introducing threading is to eventually move to having multiple threads active.
If that's the case, then the first key issue is to ensure that you never have two or more threads simultaneously trying to use the same shared system/resource -- for example, multiple threads writing at the same time to ReconFile
, in the case of the code at the pastebin URL you mention.
The classic way to avoid such issues is to use locking, but my favorite way is quite different: make sure any such resource is accessed by only one dedicated thread, and use a Queue.Queue instance (intrinsically threadsafe) to have other threads post work-request to the dedicated thread (so instead of writing to ReconFile directly each other thread would make a list of lines to be written contiguously, then .put
the list on the queue where the "recon file writing" worker thread is waiting via .get
).
When you need to get results back from such actions (not the case here), the requesting thread would place its own personal "queue on which to return results" as part of the "work request packet" it puts to the worker thread's queue. I've presented much more detail about this recommended architecture in the threading chapter of "Python in a Nutshell" 2nd edition (and why, as the book's author, I would of course never recommend you perform an illegal download of a free pirate copy of my book, I can however mention there's plenty of sites offering such pirate copies for download -- the legal way to read my book for free is to sign up for a trial offer to O'Reilly's "safari" online books website).
This does not address the specific problem you're observing, since it's happening when you only have one thread around. I notice that thread is trying to perform lots of I/O on standard input and standard output, which is possibly problematic from a thread -- consider doing the input for a thread before you start it (in the main thread) and for needed output use Python's standard logging
module, which is guaranteed to be thread-safe. Do you still observe problems then? If that's the case, then the next step is to pepper your code with logging.info
calls so that you can pinpoint exactly where it's stalling -- and tell us about that, so we can try to help from there!
精彩评论