Thread: select where not exists returning multiple rows?
We're using postgresql 9.1, and we've got a table that looks like this: testdb=# \d item Table "public.item" Column | Type | Modifiers -------+----------+----------- sig | bigint | not null type | smallint | data | text | Indexes: "item_pkey" PRIMARY KEY, btree (sig) And we're doing an insert like this: INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4) In this case $1 and $4 should always be the same. The idea is to insert if the row doesn't already exist. We're getting primary key constraint violations: 011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE) 2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique constraint "item_pkey" 2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572) already exists. I don't see how it's possible to get duplicate rows here, unless maybe the "select where not exists" is somehow returning multiple rows. Any ideas what's going on here? Thanks, Chris
On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin <chris@blaze.io> wrote: > We're using postgresql 9.1, and we've got a table that looks like this: > > testdb=# \d item > Table "public.item" > Column | Type | Modifiers > -------+----------+----------- > sig | bigint | not null > type | smallint | > data | text | > Indexes: > "item_pkey" PRIMARY KEY, btree (sig) > > And we're doing an insert like this: > INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT > NULL FROM Item WHERE Sig=$4) > > In this case $1 and $4 should always be the same. The idea is to insert if > the row doesn't already exist. > We're getting primary key constraint violations: > > 011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data) SELECT > $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE) > 2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique > constraint "item_pkey" > 2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572) already > exists. > > I don't see how it's possible to get duplicate rows here, unless maybe the > "select where not exists" is somehow returning multiple rows. > Any ideas what's going on here? race condition. lock the table first or retry the insert. merlin
On 11-11-02 08:49 AM, Merlin Moncure wrote: > On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin<chris@blaze.io> wrote: >> We're using postgresql 9.1, and we've got a table that looks like this: >> >> testdb=# \d item >> Table "public.item" >> Column | Type | Modifiers >> -------+----------+----------- >> sig | bigint | not null >> type | smallint | >> data | text | >> Indexes: >> "item_pkey" PRIMARY KEY, btree (sig) >> >> And we're doing an insert like this: >> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT >> NULL FROM Item WHERE Sig=$4) >> >> In this case $1 and $4 should always be the same. The idea is to insert if >> the row doesn't already exist. >> We're getting primary key constraint violations: >> >> 011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data) SELECT >> $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE) >> 2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique >> constraint "item_pkey" >> 2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572) already >> exists. >> >> I don't see how it's possible to get duplicate rows here, unless maybe the >> "select where not exists" is somehow returning multiple rows. >> Any ideas what's going on here? > race condition. lock the table first or retry the insert. > > merlin Could you elaborate a little more on the race condition? Are you suggesting that if two threads executed this statement at the same time, the results from the inner "SELECT NULL ..." in one of the threads could be incorrect by the time that thread did the INSERT? I thought about this possibility and tried "SELECT NULL ... FOR UPDATE", but still saw the same problem. Thanks, Chris
On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote: > And we're doing an insert like this: > INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS > ( SELECT NULL FROM Item WHERE Sig=$4) > > In this case $1 and $4 should always be the same. FWIW, If they're always going to be the same, you can put that it the query, like so: INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$1) Saves a parameter. > I don't see how it's possible to get duplicate rows here, unless > maybe the "select where not exists" is somehow returning multiple > rows. > Any ideas what's going on here? As pointed out by others, you don't say if it this is a race condition between processes or if it always does this. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
Attachment
On 11-11-02 09:13 AM, Martijn van Oosterhout wrote: > On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote: >> And we're doing an insert like this: >> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS >> ( SELECT NULL FROM Item WHERE Sig=$4) >> >> In this case $1 and $4 should always be the same. > FWIW, If they're always going to be the same, you can put that it the query, > like so: > > INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS > ( SELECT NULL FROM Item WHERE Sig=$1) > > Saves a parameter. > >> I don't see how it's possible to get duplicate rows here, unless >> maybe the "select where not exists" is somehow returning multiple >> rows. >> Any ideas what's going on here? > As pointed out by others, you don't say if it this is a race condition > between processes or if it always does this. It's only happening intermittently, but it doesn't appear to be a race condition; I'm pretty sure there's only one thread or process issuing this statement. Thanks, Chris > > Have a nice day,
On Wed, Nov 2, 2011 at 8:20 AM, Chris Dumoulin <chris@blaze.io> wrote: > On 11-11-02 09:13 AM, Martijn van Oosterhout wrote: >> >> On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote: >>> >>> And we're doing an insert like this: >>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS >>> ( SELECT NULL FROM Item WHERE Sig=$4) >>> >>> In this case $1 and $4 should always be the same. >> >> FWIW, If they're always going to be the same, you can put that it the >> query, >> like so: >> >> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS >> ( SELECT NULL FROM Item WHERE Sig=$1) >> >> Saves a parameter. >> >>> I don't see how it's possible to get duplicate rows here, unless >>> maybe the "select where not exists" is somehow returning multiple >>> rows. >>> Any ideas what's going on here? >> >> As pointed out by others, you don't say if it this is a race condition >> between processes or if it always does this. > > It's only happening intermittently, but it doesn't appear to be a race > condition; I'm pretty sure there's only one thread or process issuing this > statement. Pretty sure? you need to be 100% sure. *Somebody* was worried about concurrency in the code, because the actual statement in the log has 'FOR UPDATE' -- your example does not. Intermittent failures is classic race condition behavior. The reason for the race is that your select happens before the insert does so that process A and B can select at approximately the same time and both make the decision to insert on the same key...bam. Logging all statements will positively prove this. select <constants> where exists ... does not return > 1 rows ever and there is precisely 0% chance you've uncovered a server bug that is causing it to :-). solve the problem by: a: LOCK the table before making the insert, making sure to wrap the lock and the insert in the same transaction (this should be the default method) b. retry the transaction on failure in the client c. or on the server if you push the insert into a function. merlin