开发者

Django Python Garbage Collection woes

开发者 https://www.devze.com 2023-02-02 16:06 出处:网络
After 2 days of debug, I nailed down my time-hog: the Python garbage collector. My application holds a lot of objects in memory. And it works well.

After 2 days of debug, I nailed down my time-hog: the Python garbage collector.

My application holds a lot of objects in memory. And it works well.

The GC does the usual rounds (I have not played with the default thresholds of (700, 10, 10)).

Once in a while, in the middle of an important transaction, the 2nd generation sweep kicks in and reviews my ~1.5M generation 2 objects.

This takes 2 seconds! The nominal transaction takes less than 0.1 seconds.

My question is what should I do?

I can turn off generation 2 sweeps (by setting 开发者_JAVA百科a very high threshold - is this the right way?) and the GC is obedient.

When should I turn them on?

We implemented a web service using Django, and each user request takes about 0.1 seconds.

Optimally, I will run these GC gen 2 cycles between user API requests. But how do I do that?

My view ends with return HttpResponse(), AFTER which I would like to run a gen 2 GC sweep.

How do I do that? Does this approach even make sense?

Can I mark the object that NEVER need to be garbage collected so the GC will not test them every 2nd gen cycle?

How can I configure the GC to run full sweeps when the Django server is relatively idle?

Python 2.6.6 on multiple platforms (Windows / Linux).


We did something like this for gunicorn. Depending on what wsgi server you use, you need to find the right hooks for AFTER the response, not before. Django has a request_finished signal but that signal is still pre response.

For gunicorn, in the config you need to define 2 methods like so:

def pre_request(worker, req):
    # disable gc until end of request
    gc.disable()


def post_request(worker, req, environ, resp):
    # enable gc after a request
    gc.enable()

The post_request here runs after the http response has been delivered, and so is a very good time for garbage collection.


I believe one option would be to completely disable garbage collection and then manually collect at the end of a request as suggested here: How does the Garbage Collection mechanism work?

I imagine that you could disable the GC in your settings.py file.

If you want to run GarbageCollection on every request I would suggest developing some Middleware that does it in the process response method:

import gc
class GCMiddleware(object):
    def process_response(self, request, response):
        gc.collect()
        return response


An alternative might be to disable GC altogether, and configure mod_wsgi (or whatever you're using) to kill and restart processes more frequently.


My view ends with return HttpResponse(), AFTER which I would like to run a gen 2 GC sweep.

// turn off GC
// do stuff
resp = HttpResponse()
// turn on GC
return resp

I'm not sure, but instead of //turn on GC you might be able to // spawn thread to turn on GC in 0.1 sec.

In order to make sure that GC doesn't happen until after the request is processed, if the thread spawning doesn't work, you would need to modify django itself or use some sort of django hook, as dcurtis suggested.

If you're dealing with performance-critical code, you might also want to consider using a manual memory management language like C/C++ for that part, and using Python simply to invoke/query it.


Building on the approach from @milkypostman you can use gevent. You want one call to garbage collection per request but the problem with the @milkypostman suggestion is that the call to gc.collect() will still block the returning of the request. Gevent lets us return immediately and have the GC run proceed after the process is returned from.

First in your wsgi file be sure to monkey patch all with gevent magic stuff and disable garbage collection. You can set gc.disable() but some libraries have context managers that turn it on after disabling it (messagepack for instance), so the 0 threshold is more sticky.

import gc
from gevent import monkey

# Disable garbage collection runs
gc.set_threshold(0)
# Apply gevent monkey magic
monkey.patch_all()

Then create some middleware for Django like this:

from gc import collect
import gevent

class BaseMiddleware:

    def __init__(self, get_response):
        self.get_response = get_response


class GcCollectMiddleware(BaseMiddleware):
    """Middleware which performs a non-blocking gc.collect()"""

    def __call__(self, request):
        response = self.get_response(request)
        gevent.spawn(collect)
        return response

You'll see the main difference here vs the previously suggested approach is that gc.collect() is wrapped in gevent.spawn which will not block returning the HttpResponse and your users will get a snappier response!

0

精彩评论

暂无评论...
验证码 换一张
取 消