In a PostgreSQL database I have a table with a primary key and another field which needs to be unique.
CREATE TABLE users (
id INTEGER PRIMARY KEY DEFAULT nextval('groups_id_seq'::regclass),
name VARCHAR(255) UNIQUE NOT NULL
);
INSERT users (name) VALUES ('foo');
INSERT users (name) VALUES ('foo');
INSE开发者_开发百科RT users (name) VALUES ('bar');
The second insert fails but the sequence groups_id_seq is already incremented so when 'bar' is added it leaves a gap in the id numbers.
Is there a way to tell PostgreSQL to fetch the next value only if other constraints are met or should I check first using SELECT if the name is not duplicate? This still would not guarantee the lack of gaps but at least it would reduce their number to the rare cases when there is another process trying to insert the same name at the same time
I don't think so: a basic feature of sequences is that gaps are possible (think of two concurrent transactions, with one performing a ROLLBACK). You should ignore gaps. Why are they a problem in your case?
If you need gapless sequences - there are ways to do it, but it's not trivial, and definitely much slower.
Also - if you worry about "using too many ids" - just define id as bigserial.
It is possible, though cumbersome, to do this. As bortzmeyer says, it's dangerous to rely on values from sequences being contiguous, so it's best to just leave things as they are if you can.
If you can't:
Every access to the table that could cause a row to have a certain name (that is, every INSERT
to that table, and if you allow it (though it's poor practice) every UPDATE
that could change the name
field) must do so inside a transaction that locks soemthing first. The simplest and least performant option is to simply lock the entire table using LOCK users IN EXCLUSIVE MODE
(adding the last 3 words permits concurrent read access by other processes, which is safe).
However that is a very coarse lock that will slow performance if there are many concurrent modifications to users
; a better option would be locking a single, corresponding row in another table that must already exist. This row can be locked with SELECT ... FOR UPDATE
. This makes sense only when working with a "child" table that has a FK dependency on another "parent" table.
For example, imagine for the time being that we are actually trying to safely create new orders
for a customer
, and that these orders somehow have identifying 'names'. (I know, poor example...) orders
has a FK dependency on customers
. Then to prevent ever creating two orders with the same name for a given customer, you could do the following:
BEGIN;
-- Customer 'jbloggs' must exist for this to work.
SELECT 1 FROM customers
WHERE id = 'jbloggs'
FOR UPDATE
-- Provided every attempt to create an order performs the above step first,
-- at this point, we will have exclusive access to all orders for jbloggs.
SELECT 1 FROM orders
WHERE id = 'jbloggs'
AND order_name = 'foo'
-- Determine if the preceding query returned a row or not.
-- If it did not:
INSERT orders (id, name) VALUES ('jbloggs', 'foo');
-- Regardless, end the transaction:
END;
Note that it is not sufficient to simply lock the corresponding row in users
with SELECT ... FOR UPDATE
-- if the row does not already exist, several concurrent processes may simultaneously report that the row does not exist, and then attempt simultaneous insertions, resulting in failed transactions and thus sequence gaps.
Either locking scheme will work; what's important is that anyone trying to create a row with the same name must attempt to lock the same object.
精彩评论