开发者

How to get the latest 3 books from each author using django

开发者 https://www.devze.com 2023-02-25 03:48 出处:网络
Using the following django models: class Author(models.Model): name = models.CharField(max_lengt开发者_如何学JAVAh=100)

Using the following django models:

class Author(models.Model):
   name = models.CharField(max_lengt开发者_如何学JAVAh=100)
   age = models.IntegerField()

class Book(models.Model):
    name = models.CharField(max_length=300)
    author = models.ForeignKey(Author)
    pubdate = models.DateField()
    class Meta:
        ordering = ('-pubdate')

How can i get the five latest books published by each author?

I had considered iterate each author and get books published by the author slicing to 5.

for a in Author.objects.all():
    books = Book.objects.filter(author = a )[:5]
    print books #and/or process the entries... 

But, if the tables has a lot of records (maybe thousands of books), this could be slow and inefficient.

So, is there any other way to accomplish this with django (or a sql query) ?


I would suggest :

for a in Author.objects.all():
    books = a.book_set.all().order_by('-pub_date')[:5]
    print books #and/or process the entries... 

or, if the order should always be the same, as you define Meta,

    books = a.book_set.all()[:5]

should do the trick


If you're worried about the speed of the query, build an index on your pubdate field:

pubdate = models.DateField(db_index=True)

This should avoid scanning the entire table each time you run the query.

The SQL, in postgres, would be something like:

select b1.name, b1.author
from books b1
where b1.id in (
    select b2.id
    from books b2
    where b1.author = b2.author
    order by b2.pubdate desc
    limit 3)
order by b1.author, b1.name


For me, I had‍‍ 3 authors and each of which had more than at least 300 posts.
I tested two approaches and the difference is eye catching!
according to what django-debug-toolbar showed, first approach it had 3 similar queries which the 3 is the number of authors here and it takes a total of 8 queries (+180 ms) and CPU (400 - 800 ms), but the second one takes a total of 5 queries (+80 ms) and CPU (+70 - 90 ms).

  • also I see there is one extra query SELECT ... FROM author which I couldn't figure it out why it had been produced it seems not to be used at all.
from .models import Author, Po‍‍st

The simplest and most familiar approach

from itertools import chain

authors = Author.objects.all().prefetch_related("posts")

posts = list(chain.from_iterable(author.posts.order_by("-created")[:3]
             for author in authors)
        )

The more complicated and unfamiliar approach

from django.db.models import OuterRef, Prefetch, Subquery

subquery = Subquery(
        Post.objects.filter(author__id=OuterRef("author__id"))
        .order_by("-created")
        .values_list("id", flat=True)[:4]
)

authors = Author.objects.all().prefetch_related(
    Prefetch("posts", queryset=Post.objects.filter(id__in=subquery))
)

posts = list(chain.from_iterable(source.posts.all()
            for source in feed_sources)
)

If you just need the Posts and not the Authors:

the following way of fetching data does one less query and only returns the posts.

subquery = Subquery(
    Post.objects.filter(author__id=OuterRef("author__id"))
    .order_by("-created")
    .values_list("id", flat=True)[:4]
)

posts = (
    Post.objects.select_related("author")
    .exclude(author=an_author)
    .filter(id__in=subquery)
)
0

精彩评论

暂无评论...
验证码 换一张
取 消