# file1.py
class _Producer(self):
def __init__(self):
self.chunksize = 6220800
with open('/dev/zero') as f:
self.thing = f.read(self.chunksize)
self.n = 0
self.start()
def start(self):
import subprocess
import threading
def produce():
self._proc = subprocess.Popen(['producer_proc'], stdout=subprocess.PIPE)
while True:
self.thing = self._proc.stdout.read(self.chunksize)
if len(self.thing) != self.chunksize:
msg = 'Expected {0} bytes. Read {1} bytes'.format(self.chunksize, len(self.thing))
raise Exception(msg)
self.n += 1
t = threading.Thread(target=produce)
t.daemon = True
t.start()
self._thread = t
def stop(self):
if self._thread.is_alive():
self._proc.terminat开发者_Go百科e()
self._thread.join(1)
producer = _Producer()
producer.start()
I have written some code more or less like the above design, and now I want to be able to consume the output of producer_proc
in other files by going:
# some_other_file.py
import file1
my_thing = file1.producer.thing
Multiple other consumers might be grabbing a reference to file.producer.thing
, they all need to use from the same producer_proc
. And the producer_proc
should never be blocked. Is this a sane implementation? Does the python GIL make it thread safe, or do I need to reimplement using a Queue for getting data of the worker thread? Do consumers need to explicitly make a copy of the thing?
I guess am trying to implement something like Producer/Consumer pattern or Observer pattern, but I'm not really clear on all the technical details of design patterns.
- A single producer is constantly making things
- Multiple consumers using things at arbitrary times
producer.thing
should be replaced by a fresh thing as soon as the new one is available, most things will go unused but that's ok- It's OK for multiple consumers to read the same thing, or to read the same thing twice in succession. They only want to be sure they have got the most recent thing when asked for it, not some stale old thing.
- A consumer should be able to keep using a thing as long as they have it in scope, even though the producer may have already overwritten his
self.thing
with a fresh new thing.
Given your (unusual!) requirements, your implementation seems correct. In particular,
- If you're only updating one attribute, the Python GIL should be sufficient. Single bytecode instructions are atomic.
- If you do anything more complex, add locking! It's basically harmless anyway - if you cared about performance or multicore scalability, you probably wouldn't be using Python!
- In particular, be aware that
self.thing
andself.n
in this code are updated in a separate bytecode instructions. The GIL could be released/acquired between, so you can't get a consistent view of the two of them unless you add locking. If you're not going to do that, I'd suggest removingself.n
as it's an "attractive nuisance" (easily misused) or at least adding a comment/docstring with this caveat. - Consumers don't need to make a copy. You're not ever mutating a particular object pointed to by
self.thing
(and couldn't with string objects; they're immutable) and Python is garbage-collected, so as long as a consumer grabbed a reference to it, it can keep accessing it without worrying too much about what other threads are doing. The worst that could happen is your program using a lot of memory from several generations ofself.thing
being kept alive.
I'm a bit curious where your requirements came from. In particular, that you don't care if a thing
is never used or used many times.
精彩评论