Using the following django models:
class Author(models.Model):
name = models.CharField(max_lengt开发者_如何学JAVAh=100)
age = models.IntegerField()
class Book(models.Model):
name = models.CharField(max_length=300)
author = models.ForeignKey(Author)
pubdate = models.DateField()
class Meta:
ordering = ('-pubdate')
How can i get the five latest books published by each author?
I had considered iterate each author and get books published by the author slicing to 5.
for a in Author.objects.all():
books = Book.objects.filter(author = a )[:5]
print books #and/or process the entries...
But, if the tables has a lot of records (maybe thousands of books), this could be slow and inefficient.
So, is there any other way to accomplish this with django (or a sql query) ?
I would suggest :
for a in Author.objects.all():
books = a.book_set.all().order_by('-pub_date')[:5]
print books #and/or process the entries...
or, if the order should always be the same, as you define Meta,
books = a.book_set.all()[:5]
should do the trick
If you're worried about the speed of the query, build an index on your pubdate field:
pubdate = models.DateField(db_index=True)
This should avoid scanning the entire table each time you run the query.
The SQL, in postgres, would be something like:
select b1.name, b1.author
from books b1
where b1.id in (
select b2.id
from books b2
where b1.author = b2.author
order by b2.pubdate desc
limit 3)
order by b1.author, b1.name
For me, I had 3 authors
and each of which had more than at least 300 posts
.
I tested two approaches and the difference is eye catching!
according to what django-debug-toolbar
showed, first approach it had 3 similar queries
which the 3
is the number of authors here and it takes a total of 8 queries (+180 ms)
and CPU (400 - 800 ms)
, but the second one takes a total of 5 queries (+80 ms)
and CPU (+70 - 90 ms)
.
- also I see there is one extra query
SELECT ... FROM author
which I couldn't figure it out why it had been produced it seems not to be used at all.
from .models import Author, Post
The simplest and most familiar approach
from itertools import chain
authors = Author.objects.all().prefetch_related("posts")
posts = list(chain.from_iterable(author.posts.order_by("-created")[:3]
for author in authors)
)
The more complicated and unfamiliar approach
from django.db.models import OuterRef, Prefetch, Subquery
subquery = Subquery(
Post.objects.filter(author__id=OuterRef("author__id"))
.order_by("-created")
.values_list("id", flat=True)[:4]
)
authors = Author.objects.all().prefetch_related(
Prefetch("posts", queryset=Post.objects.filter(id__in=subquery))
)
posts = list(chain.from_iterable(source.posts.all()
for source in feed_sources)
)
If you just need the Posts and not the Authors:
the following way of fetching data does one less query and only returns the posts.
subquery = Subquery(
Post.objects.filter(author__id=OuterRef("author__id"))
.order_by("-created")
.values_list("id", flat=True)[:4]
)
posts = (
Post.objects.select_related("author")
.exclude(author=an_author)
.filter(id__in=subquery)
)
精彩评论