For example, I have a model like this:
Class Doggy(models.Model):
name = models.CharField(u'Name', max_length = 40)
color = models.CharField(u'Color', max_length = 20)
How can i select doggies with the same color? Or with the same name :)
UPD. Of course, I don't know the name or the color. I want to.. kind of, group by their values.
UPD2. I'm trying to do something like that,开发者_运维技巧 but using Django:
SELECT *
FROM table
WHERE tablefield IN (
SELECT tablefield
FROM table
GROUP BY tablefield
HAVING (COUNT(tablefield ) > 1)
)
UPD3. I'd like to do it via Django ORM, without having to iterate over the objects. I just want to get rows with duplicate values for one particular field.
I'm late to the party, but here you go:
Doggy.objects.values('color', 'name').annotate(Count('pk'))
This will give you results that have a count of how many of each Doggy you have grouped by color and name.
If you're looking for Doggy's of a certain colour - you'd do something like.
Doggy.objects.filter(color='blue')
If you want to find Doggys based on the colour of the current Doggy
def GetSimilarColoredDoggys(self):
return Doggy.objects.filter(color=self.color)
The same would go for names:-
def GetDoggysWithSameName(self):
return Doggy.objects.filter(color=self.name)
You can use itertools.groupby() for this:
import operator
import itertools
from django.db import models
def group_model_by_attr(model_class, attr_name):
assert issubclass(model_class, models.Model), \
"%s is not a Django model." % (model_class,)
assert attr_name in [field.name for field in Event._meta.fields], \
"The %s field doesn't exist on model %s" % (attr_name, model_class)
all_instances = model_class.objects.all().order_by(attr_name)
keyfunc = operator.attrgetter(attr_name)
return [{k: list(g)} for k, g in itertools.groupby(all_instances, keyfunc)]
grouped_by_color = group_model_by_attr(Doggy, 'color')
grouped_by_name = group_model_by_attr(Doggy, 'name')
grouped_by_color
(for example) will be a list of dicts like [{'purple': [doggy1, doggy2], {'pink': [doggy3,]}]
where doggy1,2, etc. are Doggy
instances.
UPDATE:
From your update it looks like you just want a list of ids for each event type. I tested this with 250k records in postgresql on my ubuntu laptop w/ a core 2 duo & 3gb of ram, and it took .35 seconds (the itertools.group_by took .72 seconds btw) to generate the dict. You mention that you have 900K records, so this should be fast enough. If it's not it should be easy to cache/update as the records change.
from collections import defaultdict
doggies = Doggy.objects.values_list('color', 'id').order_by('color').iterator()
grouped_doggies_by_color = defaultdict(list)
for color, id in doggies:
grouped_doggies_by_color[color].append(id)
I would change your data model so that the color and name are a one-to-many relationship with Doggy as follows:
class Doggy(models.Model):
name = models.ForeignKey('DoggyName')
color = models.ForeignKey('DoggyColor')
class DoggyName(models.Model):
name = models.CharField(max_length=40, unique=True)
class DoggyColor(models.Model):
color = models.CharField(max_length=20, unique=True)
Now DoggyName
and DoggyColor
do not contain duplicate names or colors, and you can use them to select dogs with the same name or color.
Okay, apparently, there's no way to do such thing with ORM only.
If you have to do it, you have to use .extra() to execute needed SQL-statement (if you are using SQL database, of course)
精彩评论