My django model looks like this:
class Entity(models.Model):
name = models.CharField(max_length=40)
examples = models.ManyToManyField(Example, blank=True)
tokens = models.ManyToManyField(Token, blank=True, null=True)
I want to enforce uniqueness among the tokens, i.e. if there is already an entity with tokens ['a', 'b', 'c'] I do not want to add another one with ['a', 'b', 'c']. However an entity with tokens ['a', 'b', 'c', 'd'] or ['a', 'b'] is a different set and should be added.
In the case where an entity with the specific set is already found, I want to add the new examples that have been discovered to that ManyToMany. The old name can stay.
Currently, I run a query to get an existing Entity with the exact token set in question (this is itself a challenge in django), then if it is found I update it with the new example. The problem is that this runs in multiple processes on multiple servers, so there is a race condition between checking whether a matching entity exists, and creating a new one, if it was not found to. This could result in entities with duplicate token sets being created.
One solution I came up with is to use an explicit through model for the ManyToMany, and overrid开发者_JS百科e the save method for the through model to create a hash of the token set and include it in the through model itself on a column with a unique constraint. I think this would work but it doesn't seem like a particularly elegant solution.
I would also imagine that this problem is somewhat common in SQL land with mapping tables in the case where you want a unique set - perhaps there is a well known solution here? I am certainly open to using raw sql.
Thanks in advance for your help!
I'm not entirely sure this method addresses your problem, but here goes.
A friend who works with Big insurance company databases was looking for a way of rapidly detecting exact dupes across multiple databases based on a subset of columns. I suggested taking the set of columns that he was checking against, run them through some sort of canonicalifier™, computing an MD5 digest, and saving that as an additional, indexed column for the table. Sucker ran like greased lightning.
So maybe you can do something similar. It doesn't address your race condition (nothing I can think of does), but it could significantly simplify the dupe-check.
精彩评论