Thread: Simple SQL INSERT to avoid duplication failed: why?

Simple SQL INSERT to avoid duplication failed: why?

From
"Carlo Stonebanks"
Date:

Ok, I tried to be clever and I wrote code to avoid inserting duplicate data. The calling function has a try-catch to recover from this, but I am curious as to why it failed:

 

INSERT INTO

   mdx_lib.acache_mdx_logic_address_validation

   (

      address,

      postal_code,

      address_id

   )

SELECT

   '306 station 22 1 2 st' AS address,

   '29482' AS postal_code,

   100165016 AS address_id

WHERE

   NOT EXISTS

   ( SELECT

      1

   FROM

      mdx_lib.acache_mdx_logic_address_validation

   WHERE

      address         = '306 station 22 1 2 st'

      AND postal_code = '29482'

   )

 

Exec status=PGRES_FATAL_ERROR error=ERROR:  duplicate key value violates unique constraint "uq_acache_mdx_logic_address_validation_idx"

DETAIL:  Key (address, postal_code)=(306 station 22 1 2 st, 29482) already exists.

 

The client insists that this process is the only one running, so if he’s right no other process could be inserting a row with the same data between the SELECT … NOT EXISTS and the actual INSERT operation.

 

This particular code works as expected right now (SELECT returns 0 rows, therefore no rows INSERTed).

 

Should this have worked?

 

Carlo

Re: Simple SQL INSERT to avoid duplication failed: why?

From
Tom Lane
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Ok, I tried to be clever and I wrote code to avoid inserting duplicate data.
> The calling function has a try-catch to recover from this, but I am curious
> as to why it failed:

There's nothing obviously wrong with that, which means the issue is in
something you didn't show us.  Care to assemble a self-contained
example?

            regards, tom lane


Re: Simple SQL INSERT to avoid duplication failed: why?

From
"Carlo Stonebanks"
Date:
Hi Tom,

>> There's nothing obviously wrong with that, which means the issue is in
something you didn't show us.  Care to assemble a self-contained example?
<<

Unfortunately, it happens erratically and very, very rarely so I can't give
you something that will fail. I expected an occasional failure and there is
a try-catch to handle it, I was just surprised when the client told me there
was no other apps running against this table. I just wanted to make sure the
logic was correct and that I wasn't doing something stupid or there is some
known SQL or PG behaviour that would explain this.

The only way I can see this happening is that an
acache_mdx_logic_address_validation sneaks in before the insert and after
the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken
and something else MUST be running and inserting into
acache_mdx_logic_address_validation.

Would you agree, or is there anything else to consider?

INSERT INTO
    mdx_lib.acache_mdx_logic_address_validation
    (
       address,
       postal_code,
       address_id
      )
 SELECT
      '306 station 22 1 2 st' AS address,
      '29482' AS postal_code,
      100165016 AS address_id
   WHERE
      NOT EXISTS
      ( SELECT
         1
      FROM
         mdx_lib.acache_mdx_logic_address_validation
      WHERE
         address         = '306 station 22 1 2 st'
         AND postal_code = '29482'
      )


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: April 25, 2013 5:06 PM
To: Carlo Stonebanks
Cc: pgsql-general@postgresql.org
Subject: Re: Simple SQL INSERT to avoid duplication failed: why?

"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> Ok, I tried to be clever and I wrote code to avoid inserting duplicate
data.
> The calling function has a try-catch to recover from this, but I am
> curious as to why it failed:

There's nothing obviously wrong with that, which means the issue is in
something you didn't show us.  Care to assemble a self-contained example?

            regards, tom lane




Re: Simple SQL INSERT to avoid duplication failed: why?

From
Steven Schlansker
Date:
On Apr 30, 2013, at 4:00 PM, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:

> Hi Tom,
>
>>> There's nothing obviously wrong with that, which means the issue is in
> something you didn't show us.  Care to assemble a self-contained example?
> <<
>
> Unfortunately, it happens erratically and very, very rarely so I can't give
> you something that will fail. I expected an occasional failure and there is
> a try-catch to handle it, I was just surprised when the client told me there
> was no other apps running against this table. I just wanted to make sure the
> logic was correct and that I wasn't doing something stupid or there is some
> known SQL or PG behaviour that would explain this.
>
> The only way I can see this happening is that an
> acache_mdx_logic_address_validation sneaks in before the insert and after
> the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken
> and something else MUST be running and inserting into
> acache_mdx_logic_address_validation.
>
> Would you agree, or is there anything else to consider?
>
> INSERT INTO
>    mdx_lib.acache_mdx_logic_address_validation
>    (
>       address,
>       postal_code,
>       address_id
>      )
> SELECT
>      '306 station 22 1 2 st' AS address,
>      '29482' AS postal_code,
>      100165016 AS address_id
>   WHERE
>      NOT EXISTS
>      ( SELECT
>         1
>      FROM
>         mdx_lib.acache_mdx_logic_address_validation
>      WHERE
>         address         = '306 station 22 1 2 st'
>         AND postal_code = '29482'
>      )
>

I know you said that it was the only active application at the time, but I
figured I'd share my experience anyway…

We do a somewhat more involved version of this to provide fake UPSERT functionality,
and this failure mode happened more often than we'd guessed it would (due to concurrent updates).

However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable
which means that if you run this transaction at isolation level SERIALIZABLE you will get
serialization failures instead of duplicate key exceptions, which makes it easy to retry until success.

So now we run any code that looks like this at SERIALIZABLE and are very happy with it.

Not sure if that'll help, but hopefully!

Best,
Steven

>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: April 25, 2013 5:06 PM
> To: Carlo Stonebanks
> Cc: pgsql-general@postgresql.org
> Subject: Re: Simple SQL INSERT to avoid duplication failed: why?
>
> "Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
>> Ok, I tried to be clever and I wrote code to avoid inserting duplicate
> data.
>> The calling function has a try-catch to recover from this, but I am
>> curious as to why it failed:
>
> There's nothing obviously wrong with that, which means the issue is in
> something you didn't show us.  Care to assemble a self-contained example?
>
>             regards, tom lane
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



Re: Simple SQL INSERT to avoid duplication failed: why?

From
Tom Lane
Date:
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
> The only way I can see this happening is that an
> acache_mdx_logic_address_validation sneaks in before the insert and after
> the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken
> and something else MUST be running and inserting into
> acache_mdx_logic_address_validation.

That seems like the way to bet to me.  Note that the window can actually
be wider than it might seem because of MVCC considerations --- for
example, if the conflicting insert has been done by an uncommitted
transaction, the EXISTS will fly right past the uncommitted row, but
then the index insertion will wait patiently to see if the conflicting
row commits, whereupon it will throw the error.

            regards, tom lane


Re: Simple SQL INSERT to avoid duplication failed: why?

From
"Carlo Stonebanks"
Date:
Very good to know, Steve. We're on 9.0 right now but I will investigate as
all the work is for unattended automatic processes which are continuously
streaming data from multiple resources and need to resolve these collisions
by themselves.


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Steven Schlansker
Sent: April 30, 2013 7:10 PM
To: Carlo Stonebanks
Cc: 'Tom Lane'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?


On Apr 30, 2013, at 4:00 PM, "Carlo Stonebanks"
<stonec.register@sympatico.ca> wrote:

> Hi Tom,
>
>>> There's nothing obviously wrong with that, which means the issue is
>>> in
> something you didn't show us.  Care to assemble a self-contained example?
> <<
>
> Unfortunately, it happens erratically and very, very rarely so I can't
> give you something that will fail. I expected an occasional failure
> and there is a try-catch to handle it, I was just surprised when the
> client told me there was no other apps running against this table. I
> just wanted to make sure the logic was correct and that I wasn't doing
> something stupid or there is some known SQL or PG behaviour that would
explain this.
>
> The only way I can see this happening is that an
> acache_mdx_logic_address_validation sneaks in before the insert and
> after the NOT EXISTS... SELECT. And for that to occur, the client must
> be mistaken and something else MUST be running and inserting into
> acache_mdx_logic_address_validation.
>
> Would you agree, or is there anything else to consider?
>
> INSERT INTO
>    mdx_lib.acache_mdx_logic_address_validation
>    (
>       address,
>       postal_code,
>       address_id
>      )
> SELECT
>      '306 station 22 1 2 st' AS address,
>      '29482' AS postal_code,
>      100165016 AS address_id
>   WHERE
>      NOT EXISTS
>      ( SELECT
>         1
>      FROM
>         mdx_lib.acache_mdx_logic_address_validation
>      WHERE
>         address         = '306 station 22 1 2 st'
>         AND postal_code = '29482'
>      )
>

I know you said that it was the only active application at the time, but I
figured I'd share my experience anyway.

We do a somewhat more involved version of this to provide fake UPSERT
functionality, and this failure mode happened more often than we'd guessed
it would (due to concurrent updates).

However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable
which means that if you run this transaction at isolation level SERIALIZABLE
you will get serialization failures instead of duplicate key exceptions,
which makes it easy to retry until success.

So now we run any code that looks like this at SERIALIZABLE and are very
happy with it.

Not sure if that'll help, but hopefully!

Best,
Steven

>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: April 25, 2013 5:06 PM
> To: Carlo Stonebanks
> Cc: pgsql-general@postgresql.org
> Subject: Re: Simple SQL INSERT to avoid duplication failed: why?
>
> "Carlo Stonebanks" <stonec.register@sympatico.ca> writes:
>> Ok, I tried to be clever and I wrote code to avoid inserting
>> duplicate
> data.
>> The calling function has a try-catch to recover from this, but I am
>> curious as to why it failed:
>
> There's nothing obviously wrong with that, which means the issue is in
> something you didn't show us.  Care to assemble a self-contained example?
>
>             regards, tom lane
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To
> make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Simple SQL INSERT to avoid duplication failed: why?

From
Merlin Moncure
Date:
On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> Very good to know, Steve. We're on 9.0 right now but I will investigate as
> all the work is for unattended automatic processes which are continuously
> streaming data from multiple resources and need to resolve these collisions
> by themselves.

If it was me, I'd be putting a 'before' statement level trigger on the
table to raise a warning into the log with the backend pid assuming I
could handle the volume.  There are lots of ways the client could turn
out to be wrong, for example client side connection poolers (which I
tend to hate).  Only when it's 100% proven this is a single backend
case (which none of us really believe is the case including you) is
further research justified.

merlin


Re: Simple SQL INSERT to avoid duplication failed: why?

From
"Carlo Stonebanks"
Date:
There are no client poolers (unless pgtcl has one I don't know about) so
this is unlikely.

The trigger is an interesting idea to try if it happens again - I can't keep
it for long as it is for a massive cache (used to deflect calls to a web
service) and will bloat the logs pretty quickly.

I have to ask myself, is it more likely that I have discovered some PG
anomaly in 9.0 that no one has ever noticed, or that the client has
accidentally launched the process twice and doesn't know it?


-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: May 1, 2013 11:37 AM
To: Carlo Stonebanks
Cc: Steven Schlansker; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?

On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
> Very good to know, Steve. We're on 9.0 right now but I will
> investigate as all the work is for unattended automatic processes
> which are continuously streaming data from multiple resources and need
> to resolve these collisions by themselves.

If it was me, I'd be putting a 'before' statement level trigger on the table
to raise a warning into the log with the backend pid assuming I could handle
the volume.  There are lots of ways the client could turn out to be wrong,
for example client side connection poolers (which I tend to hate).  Only
when it's 100% proven this is a single backend case (which none of us really
believe is the case including you) is further research justified.

merlin



Re: Simple SQL INSERT to avoid duplication failed: why?

From
Steven Schlansker
Date:
On May 1, 2013, at 9:36 AM, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote:
> I have to ask myself, is it more likely that I have discovered some PG
> anomaly in 9.0 that no one has ever noticed, or that the client has
> accidentally launched the process twice and doesn't know it?

Given my (admittedly limited) experience with the PostgreSQL project and
dealing with clients, the latter ;-)




Re: Simple SQL INSERT to avoid duplication failed: why?

From
Gavin Flower
Date:
Carlo: Please see note at the bottom...

On 02/05/13 04:36, Carlo Stonebanks wrote:
> There are no client poolers (unless pgtcl has one I don't know about) so
> this is unlikely.
>
> The trigger is an interesting idea to try if it happens again - I can't keep
> it for long as it is for a massive cache (used to deflect calls to a web
> service) and will bloat the logs pretty quickly.
>
> I have to ask myself, is it more likely that I have discovered some PG
> anomaly in 9.0 that no one has ever noticed, or that the client has
> accidentally launched the process twice and doesn't know it?
>
>
> -----Original Message-----
> From: Merlin Moncure [mailto:mmoncure@gmail.com]
> Sent: May 1, 2013 11:37 AM
> To: Carlo Stonebanks
> Cc: Steven Schlansker; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?
>
> On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks
> <stonec.register@sympatico.ca> wrote:
>> Very good to know, Steve. We're on 9.0 right now but I will
>> investigate as all the work is for unattended automatic processes
>> which are continuously streaming data from multiple resources and need
>> to resolve these collisions by themselves.
> If it was me, I'd be putting a 'before' statement level trigger on the table
> to raise a warning into the log with the backend pid assuming I could handle
> the volume.  There are lots of ways the client could turn out to be wrong,
> for example client side connection poolers (which I tend to hate).  Only
> when it's 100% proven this is a single backend case (which none of us really
> believe is the case including you) is further research justified.
>
> merlin
>
>
>
Please do not top post, posting replies at the bottom, or interspersed
with previous comments, is the norm in these lists - and generally mote
useful, as we can read first what you are replying too!


Cheers,
Gavin


Re: Simple SQL INSERT to avoid duplication failed: why?

From
Jasen Betts
Date:
On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> There are no client poolers (unless pgtcl has one I don't know about) so
> this is unlikely.
>
> The trigger is an interesting idea to try if it happens again - I can't keep
> it for long as it is for a massive cache (used to deflect calls to a web
> service) and will bloat the logs pretty quickly.
>
> I have to ask myself, is it more likely that I have discovered some PG
> anomaly in 9.0 that no one has ever noticed, or that the client has
> accidentally launched the process twice and doesn't know it?

If it's running on a web server and two clients send the same request
simultaneously could that cause it to try the same task twice
simultaneously?

--
⚂⚃ 100% natural

Re: Simple SQL INSERT to avoid duplication failed: why?

From
"Carlo Stonebanks"
Date:
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jasen Betts
Sent: May 4, 2013 7:44 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?

On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
> There are no client poolers (unless pgtcl has one I don't know about)
> so this is unlikely.
>
> The trigger is an interesting idea to try if it happens again - I
> can't keep it for long as it is for a massive cache (used to deflect
> calls to a web
> service) and will bloat the logs pretty quickly.
>
> I have to ask myself, is it more likely that I have discovered some PG
> anomaly in 9.0 that no one has ever noticed, or that the client has
> accidentally launched the process twice and doesn't know it?

>> If it's running on a web server and two clients send the same request simultaneously could that cause it to try the
>>same task twice simultaneously? 

It would, but it's not running on a web server. It's a kind of dedicated data warehouse with very limited access. No
programshave the right to write to the tables other than the one in question. 






Re: Simple SQL INSERT to avoid duplication failed: why?

From
Merlin Moncure
Date:
On Mon, May 6, 2013 at 11:19 AM, Carlo Stonebanks
<stonec.register@sympatico.ca> wrote:
>
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jasen Betts
> Sent: May 4, 2013 7:44 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why?
>
> On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> There are no client poolers (unless pgtcl has one I don't know about)
>> so this is unlikely.
>>
>> The trigger is an interesting idea to try if it happens again - I
>> can't keep it for long as it is for a massive cache (used to deflect
>> calls to a web
>> service) and will bloat the logs pretty quickly.
>>
>> I have to ask myself, is it more likely that I have discovered some PG
>> anomaly in 9.0 that no one has ever noticed, or that the client has
>> accidentally launched the process twice and doesn't know it?
>
>>> If it's running on a web server and two clients send the same request simultaneously could that cause it to try the
>>same task twice simultaneously? 
>
> It would, but it's not running on a web server. It's a kind of dedicated data warehouse with very limited access. No
programshave the right to write to the tables other than the one in question. 

If this is more of OLAP type machine and adjusting logs is not
practical, I would consider staging the records to be input into a
pre-processing table.  Then something periodically scoops the records
up and moves them to the 'real' table while purging duplicates.  the
preproc table can record the client ip and backend pid -- that way we
can catch the culprit (which is almost certainly not what the client
thinks it is).

Another less invasive way to do that is to simply have the 'before'
trigger log the operation to a (i'd make it unlogged) 'log' table that
records the insert operation and who did it (backend pid, client ip,
etc) with the time.

merlin