I'm writing a bug database using Google App Engine and I'm running into problems getting unique numbers for the bugs. The bugs each need a unique number so users can reference them easily and the numbers should be simple and as small as possible. "Hey, I fixed bug 27" or "I reopened bug 1867". Bug numbers should also increment so users have a rough sense of what bugs came after which bug.
The App Engine doesn't have true counters like SQL, so I implemented the following function which is basically the code Google recommends, but doesn't work all the time.
I occasionally see duplicate bug numbers. Currently I am the only person using the bug database (Iconically, this bug is in my bug database) and I am not entering bugs faster then one every 5 or 10 seconds (if I type fast). Although there will eventually be multiple users who might be entering bugs at the same time.
class SimpleCounterS开发者_Go百科hard(db.Model):
count = db.IntegerProperty(required=True, default=0)
def getNewID():
def txn():
index = random.randint(0, NUM_SHARDS - 1)
shard_name = "shard" + str(index)
counter = SimpleCounterShard.get_by_key_name(shard_name)
if counter is None:
counter = SimpleCounterShard(key_name=shard_name)
counter.count += 1
counter.put()
db.run_in_transaction(txn)
total = 0
for counter in SimpleCounterShard.all():
total += counter.count
return total
What am I doing wrong (or not understanding)? Or is there a better why to get unique numbers that aren't just random like the Key's or ID's seem to be in some cases on the production servers.
Sharded counters will have exactly the same problem - if you can call it that - that the built in autogenerated IDs have: they don't guarantee monotonicity. Sharded counters are designed to allow you to count things, not to allow you to assign numbers to things; as a result, you can't transactionally get the sum of all the shards, and your result is wrong.
You really should just use the built in autonumbering; I doubt users pay serious attention to the magnitude of bug numbers with respect to each other, and most solutions you can come up with will have similar issues if you want them to scale to high write rates.
If you absolutely must have sequential numbers, you could use a single counter entity, and tolerate the max insert rate of 1-10 per second, or you could start a 'counter' backend that allocates IDs in batches from the datastore and hands them out in response to RPC requests from your frontend. If the latter sounds familiar, that's because that's what the autogenerated IDs do, only sharded across multiple machines.
Your code is not really thread safe. Yes, you use a transaction to increase a counter, but you don't use it to read the counter. There are also other shards that you're not locking in that transaction (because they're in different entity group). You can imagine having two requests increase the counter and then both read the value afterwards outside the transaction (getting same values). You basically need to do everything in transaction and just have one shard. This means blocking multiple calls for new id (serializing them). I'd suggest to use different counter for each project/tag/context/whatever to better scale the writes.
Read more on transactions here.
Honestly, I'd just use the auto generated model IDs if I were you.
精彩评论