I have a class that loops over some data files, processes them, and then writes new data back out. The analysis of each file is completely independent of the others. The class contains information needed by the analysis in its attributes, but the analysis does not need to change any a开发者_开发技巧ttributes of the class. Thus I can make the analysis of one data file a single method of my class. The analysis could in principle be done in parallel since each data file is independent. As an aside, I was considering making my class iterable.
Can I use the multiprocessing module to spawn processes that are methods of my class? I need to use multiprocessing because I'm using third party code that has a really bad memory leak (fills up all 24Gb of memory after about 100 data files).
If not, how would you go about doing this? Would you just use a normal function called by my class (passing all the information I need as arguments) instead of a method? How are arguments passed to functions in multiprocessing? Does it make a deep copy?
Yes, if you are not updating data on the class itself that needs to be shared across the instances, multiprocessing is the tool for you in this case.
You're not mentioning your process using any external resources, so it should be fork()-safe. Fork duplicates the memory and file descriptors, program state is identical in the parent and the child. Unless you're using windows which can't fork, go for it.
精彩评论