Thread: Simple SQL INSERT to avoid duplication failed: why?
Ok, I tried to be clever and I wrote code to avoid inserting duplicate data. The calling function has a try-catch to recover from this, but I am curious as to why it failed:
INSERT INTO
mdx_lib.acache_mdx_logic_address_validation
(
address,
postal_code,
address_id
)
SELECT
'306 station 22 1 2 st' AS address,
'29482' AS postal_code,
100165016 AS address_id
WHERE
NOT EXISTS
( SELECT
1
FROM
mdx_lib.acache_mdx_logic_address_validation
WHERE
address = '306 station 22 1 2 st'
AND postal_code = '29482'
)
Exec status=PGRES_FATAL_ERROR error=ERROR: duplicate key value violates unique constraint "uq_acache_mdx_logic_address_validation_idx"
DETAIL: Key (address, postal_code)=(306 station 22 1 2 st, 29482) already exists.
The client insists that this process is the only one running, so if he’s right no other process could be inserting a row with the same data between the SELECT … NOT EXISTS and the actual INSERT operation.
This particular code works as expected right now (SELECT returns 0 rows, therefore no rows INSERTed).
Should this have worked?
Carlo
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes: > Ok, I tried to be clever and I wrote code to avoid inserting duplicate data. > The calling function has a try-catch to recover from this, but I am curious > as to why it failed: There's nothing obviously wrong with that, which means the issue is in something you didn't show us. Care to assemble a self-contained example? regards, tom lane
Hi Tom, >> There's nothing obviously wrong with that, which means the issue is in something you didn't show us. Care to assemble a self-contained example? << Unfortunately, it happens erratically and very, very rarely so I can't give you something that will fail. I expected an occasional failure and there is a try-catch to handle it, I was just surprised when the client told me there was no other apps running against this table. I just wanted to make sure the logic was correct and that I wasn't doing something stupid or there is some known SQL or PG behaviour that would explain this. The only way I can see this happening is that an acache_mdx_logic_address_validation sneaks in before the insert and after the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken and something else MUST be running and inserting into acache_mdx_logic_address_validation. Would you agree, or is there anything else to consider? INSERT INTO mdx_lib.acache_mdx_logic_address_validation ( address, postal_code, address_id ) SELECT '306 station 22 1 2 st' AS address, '29482' AS postal_code, 100165016 AS address_id WHERE NOT EXISTS ( SELECT 1 FROM mdx_lib.acache_mdx_logic_address_validation WHERE address = '306 station 22 1 2 st' AND postal_code = '29482' ) -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: April 25, 2013 5:06 PM To: Carlo Stonebanks Cc: pgsql-general@postgresql.org Subject: Re: Simple SQL INSERT to avoid duplication failed: why? "Carlo Stonebanks" <stonec.register@sympatico.ca> writes: > Ok, I tried to be clever and I wrote code to avoid inserting duplicate data. > The calling function has a try-catch to recover from this, but I am > curious as to why it failed: There's nothing obviously wrong with that, which means the issue is in something you didn't show us. Care to assemble a self-contained example? regards, tom lane
On Apr 30, 2013, at 4:00 PM, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > Hi Tom, > >>> There's nothing obviously wrong with that, which means the issue is in > something you didn't show us. Care to assemble a self-contained example? > << > > Unfortunately, it happens erratically and very, very rarely so I can't give > you something that will fail. I expected an occasional failure and there is > a try-catch to handle it, I was just surprised when the client told me there > was no other apps running against this table. I just wanted to make sure the > logic was correct and that I wasn't doing something stupid or there is some > known SQL or PG behaviour that would explain this. > > The only way I can see this happening is that an > acache_mdx_logic_address_validation sneaks in before the insert and after > the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken > and something else MUST be running and inserting into > acache_mdx_logic_address_validation. > > Would you agree, or is there anything else to consider? > > INSERT INTO > mdx_lib.acache_mdx_logic_address_validation > ( > address, > postal_code, > address_id > ) > SELECT > '306 station 22 1 2 st' AS address, > '29482' AS postal_code, > 100165016 AS address_id > WHERE > NOT EXISTS > ( SELECT > 1 > FROM > mdx_lib.acache_mdx_logic_address_validation > WHERE > address = '306 station 22 1 2 st' > AND postal_code = '29482' > ) > I know you said that it was the only active application at the time, but I figured I'd share my experience anyway… We do a somewhat more involved version of this to provide fake UPSERT functionality, and this failure mode happened more often than we'd guessed it would (due to concurrent updates). However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable which means that if you run this transaction at isolation level SERIALIZABLE you will get serialization failures instead of duplicate key exceptions, which makes it easy to retry until success. So now we run any code that looks like this at SERIALIZABLE and are very happy with it. Not sure if that'll help, but hopefully! Best, Steven > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: April 25, 2013 5:06 PM > To: Carlo Stonebanks > Cc: pgsql-general@postgresql.org > Subject: Re: Simple SQL INSERT to avoid duplication failed: why? > > "Carlo Stonebanks" <stonec.register@sympatico.ca> writes: >> Ok, I tried to be clever and I wrote code to avoid inserting duplicate > data. >> The calling function has a try-catch to recover from this, but I am >> curious as to why it failed: > > There's nothing obviously wrong with that, which means the issue is in > something you didn't show us. Care to assemble a self-contained example? > > regards, tom lane > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
"Carlo Stonebanks" <stonec.register@sympatico.ca> writes: > The only way I can see this happening is that an > acache_mdx_logic_address_validation sneaks in before the insert and after > the NOT EXISTS... SELECT. And for that to occur, the client must be mistaken > and something else MUST be running and inserting into > acache_mdx_logic_address_validation. That seems like the way to bet to me. Note that the window can actually be wider than it might seem because of MVCC considerations --- for example, if the conflicting insert has been done by an uncommitted transaction, the EXISTS will fly right past the uncommitted row, but then the index insertion will wait patiently to see if the conflicting row commits, whereupon it will throw the error. regards, tom lane
Very good to know, Steve. We're on 9.0 right now but I will investigate as all the work is for unattended automatic processes which are continuously streaming data from multiple resources and need to resolve these collisions by themselves. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Steven Schlansker Sent: April 30, 2013 7:10 PM To: Carlo Stonebanks Cc: 'Tom Lane'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why? On Apr 30, 2013, at 4:00 PM, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > Hi Tom, > >>> There's nothing obviously wrong with that, which means the issue is >>> in > something you didn't show us. Care to assemble a self-contained example? > << > > Unfortunately, it happens erratically and very, very rarely so I can't > give you something that will fail. I expected an occasional failure > and there is a try-catch to handle it, I was just surprised when the > client told me there was no other apps running against this table. I > just wanted to make sure the logic was correct and that I wasn't doing > something stupid or there is some known SQL or PG behaviour that would explain this. > > The only way I can see this happening is that an > acache_mdx_logic_address_validation sneaks in before the insert and > after the NOT EXISTS... SELECT. And for that to occur, the client must > be mistaken and something else MUST be running and inserting into > acache_mdx_logic_address_validation. > > Would you agree, or is there anything else to consider? > > INSERT INTO > mdx_lib.acache_mdx_logic_address_validation > ( > address, > postal_code, > address_id > ) > SELECT > '306 station 22 1 2 st' AS address, > '29482' AS postal_code, > 100165016 AS address_id > WHERE > NOT EXISTS > ( SELECT > 1 > FROM > mdx_lib.acache_mdx_logic_address_validation > WHERE > address = '306 station 22 1 2 st' > AND postal_code = '29482' > ) > I know you said that it was the only active application at the time, but I figured I'd share my experience anyway. We do a somewhat more involved version of this to provide fake UPSERT functionality, and this failure mode happened more often than we'd guessed it would (due to concurrent updates). However, new in 9.1 is SSI -- http://wiki.postgresql.org/wiki/Serializable which means that if you run this transaction at isolation level SERIALIZABLE you will get serialization failures instead of duplicate key exceptions, which makes it easy to retry until success. So now we run any code that looks like this at SERIALIZABLE and are very happy with it. Not sure if that'll help, but hopefully! Best, Steven > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: April 25, 2013 5:06 PM > To: Carlo Stonebanks > Cc: pgsql-general@postgresql.org > Subject: Re: Simple SQL INSERT to avoid duplication failed: why? > > "Carlo Stonebanks" <stonec.register@sympatico.ca> writes: >> Ok, I tried to be clever and I wrote code to avoid inserting >> duplicate > data. >> The calling function has a try-catch to recover from this, but I am >> curious as to why it failed: > > There's nothing obviously wrong with that, which means the issue is in > something you didn't show us. Care to assemble a self-contained example? > > regards, tom lane > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To > make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Very good to know, Steve. We're on 9.0 right now but I will investigate as > all the work is for unattended automatic processes which are continuously > streaming data from multiple resources and need to resolve these collisions > by themselves. If it was me, I'd be putting a 'before' statement level trigger on the table to raise a warning into the log with the backend pid assuming I could handle the volume. There are lots of ways the client could turn out to be wrong, for example client side connection poolers (which I tend to hate). Only when it's 100% proven this is a single backend case (which none of us really believe is the case including you) is further research justified. merlin
There are no client poolers (unless pgtcl has one I don't know about) so this is unlikely. The trigger is an interesting idea to try if it happens again - I can't keep it for long as it is for a massive cache (used to deflect calls to a web service) and will bloat the logs pretty quickly. I have to ask myself, is it more likely that I have discovered some PG anomaly in 9.0 that no one has ever noticed, or that the client has accidentally launched the process twice and doesn't know it? -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: May 1, 2013 11:37 AM To: Carlo Stonebanks Cc: Steven Schlansker; pgsql-general@postgresql.org Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why? On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Very good to know, Steve. We're on 9.0 right now but I will > investigate as all the work is for unattended automatic processes > which are continuously streaming data from multiple resources and need > to resolve these collisions by themselves. If it was me, I'd be putting a 'before' statement level trigger on the table to raise a warning into the log with the backend pid assuming I could handle the volume. There are lots of ways the client could turn out to be wrong, for example client side connection poolers (which I tend to hate). Only when it's 100% proven this is a single backend case (which none of us really believe is the case including you) is further research justified. merlin
On May 1, 2013, at 9:36 AM, "Carlo Stonebanks" <stonec.register@sympatico.ca> wrote: > I have to ask myself, is it more likely that I have discovered some PG > anomaly in 9.0 that no one has ever noticed, or that the client has > accidentally launched the process twice and doesn't know it? Given my (admittedly limited) experience with the PostgreSQL project and dealing with clients, the latter ;-)
Carlo: Please see note at the bottom... On 02/05/13 04:36, Carlo Stonebanks wrote: > There are no client poolers (unless pgtcl has one I don't know about) so > this is unlikely. > > The trigger is an interesting idea to try if it happens again - I can't keep > it for long as it is for a massive cache (used to deflect calls to a web > service) and will bloat the logs pretty quickly. > > I have to ask myself, is it more likely that I have discovered some PG > anomaly in 9.0 that no one has ever noticed, or that the client has > accidentally launched the process twice and doesn't know it? > > > -----Original Message----- > From: Merlin Moncure [mailto:mmoncure@gmail.com] > Sent: May 1, 2013 11:37 AM > To: Carlo Stonebanks > Cc: Steven Schlansker; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why? > > On Wed, May 1, 2013 at 7:16 AM, Carlo Stonebanks > <stonec.register@sympatico.ca> wrote: >> Very good to know, Steve. We're on 9.0 right now but I will >> investigate as all the work is for unattended automatic processes >> which are continuously streaming data from multiple resources and need >> to resolve these collisions by themselves. > If it was me, I'd be putting a 'before' statement level trigger on the table > to raise a warning into the log with the backend pid assuming I could handle > the volume. There are lots of ways the client could turn out to be wrong, > for example client side connection poolers (which I tend to hate). Only > when it's 100% proven this is a single backend case (which none of us really > believe is the case including you) is further research justified. > > merlin > > > Please do not top post, posting replies at the bottom, or interspersed with previous comments, is the norm in these lists - and generally mote useful, as we can read first what you are replying too! Cheers, Gavin
On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > There are no client poolers (unless pgtcl has one I don't know about) so > this is unlikely. > > The trigger is an interesting idea to try if it happens again - I can't keep > it for long as it is for a massive cache (used to deflect calls to a web > service) and will bloat the logs pretty quickly. > > I have to ask myself, is it more likely that I have discovered some PG > anomaly in 9.0 that no one has ever noticed, or that the client has > accidentally launched the process twice and doesn't know it? If it's running on a web server and two clients send the same request simultaneously could that cause it to try the same task twice simultaneously? -- ⚂⚃ 100% natural
-----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jasen Betts Sent: May 4, 2013 7:44 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why? On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > There are no client poolers (unless pgtcl has one I don't know about) > so this is unlikely. > > The trigger is an interesting idea to try if it happens again - I > can't keep it for long as it is for a massive cache (used to deflect > calls to a web > service) and will bloat the logs pretty quickly. > > I have to ask myself, is it more likely that I have discovered some PG > anomaly in 9.0 that no one has ever noticed, or that the client has > accidentally launched the process twice and doesn't know it? >> If it's running on a web server and two clients send the same request simultaneously could that cause it to try the >>same task twice simultaneously? It would, but it's not running on a web server. It's a kind of dedicated data warehouse with very limited access. No programshave the right to write to the tables other than the one in question.
On Mon, May 6, 2013 at 11:19 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jasen Betts > Sent: May 4, 2013 7:44 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Simple SQL INSERT to avoid duplication failed: why? > > On 2013-05-01, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: >> There are no client poolers (unless pgtcl has one I don't know about) >> so this is unlikely. >> >> The trigger is an interesting idea to try if it happens again - I >> can't keep it for long as it is for a massive cache (used to deflect >> calls to a web >> service) and will bloat the logs pretty quickly. >> >> I have to ask myself, is it more likely that I have discovered some PG >> anomaly in 9.0 that no one has ever noticed, or that the client has >> accidentally launched the process twice and doesn't know it? > >>> If it's running on a web server and two clients send the same request simultaneously could that cause it to try the >>same task twice simultaneously? > > It would, but it's not running on a web server. It's a kind of dedicated data warehouse with very limited access. No programshave the right to write to the tables other than the one in question. If this is more of OLAP type machine and adjusting logs is not practical, I would consider staging the records to be input into a pre-processing table. Then something periodically scoops the records up and moves them to the 'real' table while purging duplicates. the preproc table can record the client ip and backend pid -- that way we can catch the culprit (which is almost certainly not what the client thinks it is). Another less invasive way to do that is to simply have the 'before' trigger log the operation to a (i'd make it unlogged) 'log' table that records the insert operation and who did it (backend pid, client ip, etc) with the time. merlin