开发者

Huge Django Session table, normal behaviour or bug?

开发者 https://www.devze.com 2023-01-30 08:57 出处:网络
Perhaps this is completely normal behaviour, but I feel like the django_session table is much larger than it should have to be.

Perhaps this is completely normal behaviour, but I feel like the django_session table is much larger than it should have to be.

First of all, I run the following cleanup command daily so the size is not caused by expired sessions:

DELETE FROM %s WHERE expire_date < NOW()

The numbers:

  • We've got about 5000 unique visitors (bots excluded) every day.
  • The SESSION_COOKIE_AGE is set to the default, 2 weeks
  • The table has a little over 1,000,000 rows

So, I'm guessing that Django also generates session keys for all bots that visits the site and that the bots don't store the cookies so it continuously generates new cookies.

But... is this normal behaviour? Is there a setting so Django won't generate sessions for anonymous users, o开发者_开发问答r atleast... no sessions for users that aren't using sessions?


After a bit of debugging I've managed to trace cause of the problem. One of my middlewares (and most of my views) have a request.user.is_authenticated() in them.

The django.contrib.auth middleware sets request.user to LazyUser()

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L13 (I don't see why there is a return None there, but ok...)

class AuthenticationMiddleware(object):
    def process_request(self, request):
        assert hasattr(request, 'session'), "The Django authentication middleware requires session middleware to be installed. Edit your MIDDLEWARE_CLASSES setting to insert 'django.contrib.sessions.middleware.SessionMiddleware'."
        request.__class__.user = LazyUser()
        return None

The LazyUser calls get_user(request) to get the user:

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L5

class LazyUser(object):
    def __get__(self, request, obj_type=None):
        if not hasattr(request, '_cached_user'):
            from django.contrib.auth import get_user
            request._cached_user = get_user(request)
       return request._cached_user

The get_user(request) method does a user_id = request.session[SESSION_KEY]

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/init.py?rev=14919#L100

def get_user(request):
    from django.contrib.auth.models import AnonymousUser
    try:
        user_id = request.session[SESSION_KEY]
        backend_path = request.session[BACKEND_SESSION_KEY]
        backend = load_backend(backend_path)
        user = backend.get_user(user_id) or AnonymousUser()
    except KeyError:
        user = AnonymousUser()
    return user

Upon accessing the session sets accessed to true:

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/sessions/backends/base.py?rev=14919#L183

def _get_session(self, no_load=False):
    """
    Lazily loads session from storage (unless "no_load" is True, when only
    an empty dict is stored) and stores it in the current instance.
    """
    self.accessed = True
    try:
        return self._session_cache
    except AttributeError:
        if self._session_key is None or no_load:
            self._session_cache = {}
        else:
            self._session_cache = self.load()
    return self._session_cache

And that causes the session to initialize. The bug was caused by a faulty session backend that also generates a session when accessed is set to true...


Is it possible for robots to access any page where you set anything in a user session (even for anonymous users), or any page where you use session.set_test_cookie() (for example Django's default login view in calls this method)? In both of these cases a new session object is created. Excluding such URLs in robots.txt should help.


For my case, I wrongly set SESSION_SAVE_EVERY_REQUEST = True in settings.py without understanding the exact meaning.

Then every request to my django service would generate a session entry, especially the heartbeat test request from upstream load balancers. After several days' running, django_session table turned to a huge one.


Django offers a management command to cleanup these expired sessions!

0

精彩评论

暂无评论...
验证码 换一张
取 消