Thread: select where not exists returning multiple rows?

select where not exists returning multiple rows?

From
Chris Dumoulin
Date:
We're using postgresql 9.1, and we've got a table that looks like this:

testdb=# \d item
Table "public.item"
   Column   |   Type   | Modifiers
-------+----------+-----------
  sig   | bigint   | not null
  type  | smallint |
  data  | text     |
Indexes:
     "item_pkey" PRIMARY KEY, btree (sig)

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS (
SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same. The idea is to insert
if the row doesn't already exist.
We're getting primary key constraint violations:

011-10-31 22:50:26 CDT STATEMENT:  INSERT INTO Item (Sig, Type, Data)
SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4
FOR UPDATE)
2011-10-31 22:52:56 CDT ERROR:  duplicate key value violates unique
constraint "item_pkey"
2011-10-31 22:52:56 CDT DETAIL:  Key (sig)=(-4668668895560071572)
already exists.

I don't see how it's possible to get duplicate rows here, unless maybe
the "select where not exists" is somehow returning multiple rows.
Any ideas what's going on here?

Thanks,
Chris

Re: select where not exists returning multiple rows?

From
Merlin Moncure
Date:
On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin <chris@blaze.io> wrote:
> We're using postgresql 9.1, and we've got a table that looks like this:
>
> testdb=# \d item
> Table "public.item"
>  Column   |   Type   | Modifiers
> -------+----------+-----------
>  sig   | bigint   | not null
>  type  | smallint |
>  data  | text     |
> Indexes:
>    "item_pkey" PRIMARY KEY, btree (sig)
>
> And we're doing an insert like this:
> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT
> NULL FROM Item WHERE Sig=$4)
>
> In this case $1 and $4 should always be the same. The idea is to insert if
> the row doesn't already exist.
> We're getting primary key constraint violations:
>
> 011-10-31 22:50:26 CDT STATEMENT:  INSERT INTO Item (Sig, Type, Data) SELECT
> $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE)
> 2011-10-31 22:52:56 CDT ERROR:  duplicate key value violates unique
> constraint "item_pkey"
> 2011-10-31 22:52:56 CDT DETAIL:  Key (sig)=(-4668668895560071572) already
> exists.
>
> I don't see how it's possible to get duplicate rows here, unless maybe the
> "select where not exists" is somehow returning multiple rows.
> Any ideas what's going on here?

race condition.  lock the table first or retry the insert.

merlin

Re: select where not exists returning multiple rows?

From
Chris Dumoulin
Date:
On 11-11-02 08:49 AM, Merlin Moncure wrote:
> On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin<chris@blaze.io>  wrote:
>> We're using postgresql 9.1, and we've got a table that looks like this:
>>
>> testdb=# \d item
>> Table "public.item"
>>   Column   |   Type   | Modifiers
>> -------+----------+-----------
>>   sig   | bigint   | not null
>>   type  | smallint |
>>   data  | text     |
>> Indexes:
>>     "item_pkey" PRIMARY KEY, btree (sig)
>>
>> And we're doing an insert like this:
>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT
>> NULL FROM Item WHERE Sig=$4)
>>
>> In this case $1 and $4 should always be the same. The idea is to insert if
>> the row doesn't already exist.
>> We're getting primary key constraint violations:
>>
>> 011-10-31 22:50:26 CDT STATEMENT:  INSERT INTO Item (Sig, Type, Data) SELECT
>> $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE)
>> 2011-10-31 22:52:56 CDT ERROR:  duplicate key value violates unique
>> constraint "item_pkey"
>> 2011-10-31 22:52:56 CDT DETAIL:  Key (sig)=(-4668668895560071572) already
>> exists.
>>
>> I don't see how it's possible to get duplicate rows here, unless maybe the
>> "select where not exists" is somehow returning multiple rows.
>> Any ideas what's going on here?
> race condition.  lock the table first or retry the insert.
>
> merlin

Could you elaborate a little more on the race condition? Are you
suggesting that if two threads executed this statement at the same time,
the results from the inner "SELECT NULL ..." in one of the threads could
be incorrect by the time that thread did the INSERT? I thought about
this possibility and tried  "SELECT NULL ... FOR UPDATE", but still saw
the same problem.

Thanks,
Chris

Re: select where not exists returning multiple rows?

From
Martijn van Oosterhout
Date:
On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:
> And we're doing an insert like this:
> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
> ( SELECT NULL FROM Item WHERE Sig=$4)
>
> In this case $1 and $4 should always be the same.

FWIW, If they're always going to be the same, you can put that it the query,
like so:

INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$1)

Saves a parameter.

> I don't see how it's possible to get duplicate rows here, unless
> maybe the "select where not exists" is somehow returning multiple
> rows.
> Any ideas what's going on here?

As pointed out by others, you don't say if it this is a race condition
between processes or if it always does this.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: select where not exists returning multiple rows?

From
Chris Dumoulin
Date:
On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:
> On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:
>> And we're doing an insert like this:
>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>> ( SELECT NULL FROM Item WHERE Sig=$4)
>>
>> In this case $1 and $4 should always be the same.
> FWIW, If they're always going to be the same, you can put that it the query,
> like so:
>
> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
> ( SELECT NULL FROM Item WHERE Sig=$1)
>
> Saves a parameter.
>
>> I don't see how it's possible to get duplicate rows here, unless
>> maybe the "select where not exists" is somehow returning multiple
>> rows.
>> Any ideas what's going on here?
> As pointed out by others, you don't say if it this is a race condition
> between processes or if it always does this.

It's only happening intermittently, but it doesn't appear to be a race
condition; I'm pretty sure there's only one thread or process issuing
this statement.

Thanks,
Chris

>
> Have a nice day,


Re: select where not exists returning multiple rows?

From
Merlin Moncure
Date:
On Wed, Nov 2, 2011 at 8:20 AM, Chris Dumoulin <chris@blaze.io> wrote:
> On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:
>>
>> On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:
>>>
>>> And we're doing an insert like this:
>>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>>> ( SELECT NULL FROM Item WHERE Sig=$4)
>>>
>>> In this case $1 and $4 should always be the same.
>>
>> FWIW, If they're always going to be the same, you can put that it the
>> query,
>> like so:
>>
>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>> ( SELECT NULL FROM Item WHERE Sig=$1)
>>
>> Saves a parameter.
>>
>>> I don't see how it's possible to get duplicate rows here, unless
>>> maybe the "select where not exists" is somehow returning multiple
>>> rows.
>>> Any ideas what's going on here?
>>
>> As pointed out by others, you don't say if it this is a race condition
>> between processes or if it always does this.
>
> It's only happening intermittently, but it doesn't appear to be a race
> condition; I'm pretty sure there's only one thread or process issuing this
> statement.

Pretty sure? you need to be 100% sure.  *Somebody* was worried about
concurrency in the code, because the actual statement in the log has
'FOR UPDATE' -- your example does not.   Intermittent failures is
classic race condition behavior.  The reason for the race is that your
select happens before the insert does so that process A and B can
select at approximately the same time and both make the decision to
insert on the same key...bam.  Logging all statements will positively
prove this.

select <constants> where exists ... does not return > 1 rows ever and
there is precisely 0% chance you've uncovered a server bug that is
causing it to :-).

solve the problem by:
a: LOCK the table before making the insert, making sure to wrap the
lock and the insert in the same transaction (this should be the
default method)
b. retry the transaction on failure in the client
c. or on the server if you push the insert into a function.

merlin