Re: select where not exists returning multiple rows? - Mailing list pgsql-general

From Merlin Moncure
Subject Re: select where not exists returning multiple rows?
Date
Msg-id CAHyXU0zz4Tj5TutSsOgJpz+ZM9eBrMvBCfGCTiSe8hQy+bcZww@mail.gmail.com
Whole thread Raw
In response to Re: select where not exists returning multiple rows?  (Chris Dumoulin <chris@blaze.io>)
List pgsql-general
On Wed, Nov 2, 2011 at 8:20 AM, Chris Dumoulin <chris@blaze.io> wrote:
> On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:
>>
>> On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:
>>>
>>> And we're doing an insert like this:
>>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>>> ( SELECT NULL FROM Item WHERE Sig=$4)
>>>
>>> In this case $1 and $4 should always be the same.
>>
>> FWIW, If they're always going to be the same, you can put that it the
>> query,
>> like so:
>>
>> INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
>> ( SELECT NULL FROM Item WHERE Sig=$1)
>>
>> Saves a parameter.
>>
>>> I don't see how it's possible to get duplicate rows here, unless
>>> maybe the "select where not exists" is somehow returning multiple
>>> rows.
>>> Any ideas what's going on here?
>>
>> As pointed out by others, you don't say if it this is a race condition
>> between processes or if it always does this.
>
> It's only happening intermittently, but it doesn't appear to be a race
> condition; I'm pretty sure there's only one thread or process issuing this
> statement.

Pretty sure? you need to be 100% sure.  *Somebody* was worried about
concurrency in the code, because the actual statement in the log has
'FOR UPDATE' -- your example does not.   Intermittent failures is
classic race condition behavior.  The reason for the race is that your
select happens before the insert does so that process A and B can
select at approximately the same time and both make the decision to
insert on the same key...bam.  Logging all statements will positively
prove this.

select <constants> where exists ... does not return > 1 rows ever and
there is precisely 0% chance you've uncovered a server bug that is
causing it to :-).

solve the problem by:
a: LOCK the table before making the insert, making sure to wrap the
lock and the insert in the same transaction (this should be the
default method)
b. retry the transaction on failure in the client
c. or on the server if you push the insert into a function.

merlin

pgsql-general by date:

Previous
From: Chris Dumoulin
Date:
Subject: Re: select where not exists returning multiple rows?
Next
From: Vincent de Phily
Date:
Subject: Re: does reindex need exclusive table access?