I am using a multiprocessing pool of workers as a part of a much larger application. Since I use it for crunching a large volume of simple math, I have a shared-nothing architecture, where the only variables the workers ever need are passed on as arguments. Thus, I do not need the worker subprocesses to import any globals, my __main__
module or, consequently, any of the modules it imports. Is 开发者_StackOverflowthere any way to force such a behavior and avoid the performance hit when spawning the pool?
I should note that my environment is Win32, which lacks os.fork()
and the worker processes are spawned "using a subprocess call to sys.executable (i.e. start a new Python process) followed by serializing all of the globals, and sending those over the pipe." as per this SO post. This being said, I want to do as little of the above as possible so my pool opens faster.
Any ideas?
Looking at the multiprocessing.forking
implementation, particularly get_preparation_data
and prepare
(win32-specific), globals aren't getting pickled. The reimport of the parent process's __main__
is a bit ugly, but it won't run any code except the one at the toplevel; not even if __name__ == '__main__'
clauses. So just keep the main module without import-time side-effects.
You can prevent your main module from importing anything when the subprocess starts, too (only useful on win32 which, as you note, can't fork). Move the main()
and its imports to a separate module, so that the startup script contains only:
if '__name__' == '__main__':
from mainmodule import main
main()
There is still an implicit import site
in the child process startup. It does important initialisation and I don't think mp.forking has an easy way to disable it, but I don't expect it to be expensive anyway.
精彩评论