开发者

ProgrammingError when aggregating over an annotated & grouped Django ORM query

开发者 https://www.devze.com 2022-12-25 02:42 出处:网络
I\'m trying to construct a query to get the \"average, maximum and minimum number of items purchased per user\".

I'm trying to construct a query to get the "average, maximum and minimum number of items purchased per user".

The data source is this simple sales records table:

class SalesRecord(models.Model):
    id           = models.IntegerField(primary_key=True)
    user_id      = models.IntegerField()
    product_code = models.CharField()
    price        = models.IntegerField()
    created_at   = models.DateTimeField()

A new record is inserted into this table for every item purchased by a user. [Note]: user_id is not a foreign key to a table within the same database, because this backend system does not manage user information. The value is supplied by the frontend part of the product.

Here's my attempt at building the query:

q = SalesRecord.objects.all()
q = q.values('user_id').annotate(   # group by user and count the # of records
    count=Count('id'),              # (= # of items)
    ).order_by()
result = q.aggregate(Max('count'), Min('count'), Avg('count'))

When I try to execute the code, a ProgrammingError is raised at the last line:

(1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to开发者_Python百科 use near 'FROM (SELECT sales_records.user_id AS user_id, COUNT(sales_records.`' at line 1")

Django's error screen shows that the SQL is

SELECT FROM
  (SELECT
    `sales_records`.`user_id` AS `user_id`,
    COUNT(`sales_records`.`id`) AS `count`
  FROM `sales_records`
  WHERE (`sales_records`.`created_at` >= %s AND `sales_records`.`created_at` <= %s )
  GROUP BY `sales_records`.`user_id` ORDER BY NULL) subquery

It's not selecting anything! Can someone please show me the right way to do this?

Hacking Django

I've found that clearing the cache of selected fields in django.db.models.sql.query.BaseQuery.get_aggregation() seems to solve the problem. Though I'm not really sure this is a fix or a workaround.

@@ -327,10 +327,13 @@
    # Remove any aggregates marked for reduction from the subquery
    # and move them to the outer AggregateQuery.
+   self._aggregate_select_cache = None
+   self.aggregate_select_mask = None
    for alias, aggregate in self.aggregate_select.items():
        if aggregate.is_summary:
            query.aggregate_select[alias] = aggregate
-           del obj.aggregate_select[alias]
+           if alias in obj.aggregate_select:
+               del obj.aggregate_select[alias]

... yields result:

{'count__max': 267, 'count__avg': 26.2563, 'count__min': 1}


Using the model as is (w/o a FK to User), you can get the user_id counts and then do the math yourself:

counts = SalesRecord.objects.values('user_id').\
        annotate(count=Count('id')).values_list('count', flat=True)
(max(counts), min(counts), sum(counts) / float(len(counts)))

If you were able to change the table to use a ForeignKey and make your model look more like this:

class SalesRecord(model.Models):
    user = model.ForeignKey(User)
    product_code = models.CharField()
    price        = models.IntegerField()
    created_at   = models.DateTimeField()

Then you could approach the problem from the User object and use aggregate():

users_with_counts = Users.objects.annotate(count=Count('salesrecord'))
stats = users_with_counts.aggregate(Max('count'), Min('count'), Avg('count'))

Either way will give you what you want with a single database query.


Your ORM query is indeed correct, but the bug is in Django 1.6. Apparently it's been fixed in 1.7. Source: https://code.djangoproject.com/ticket/23669#comment:5

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号