Thread: Duplicate key insert question

Duplicate key insert question

From
Jean-Christian Imbeault
Date:
I have a table with a primary field and a few other fields. What is the
fastest way to do an insert into that table assuming that sometimes I
might try to insert a record with a duplicate primary key and want that
to fail?

I know that if I try a plain insert this will work, but in the case
where I am trying to insert a duplicate key, the insert fails (as it
should) and an error is logged.

I could first do a check to see if there is already an entry with the
same key as the one I am trying to insert but that would mean quite a
few extra operations.

Is there a quick and clean way of doing something like:

INSERT into table values(...) IF there isn't already a row with pkey=x

Thanks,

Jean-Christian Imbeault

PS The reason I am asking is that an open source project is using MySQL
as their DB and they have a port to PG that isn't very clean b/c the DB
code keeps trying to insert duplicate primary keys. According to them
MySQL doesn't complain and just drops the insert whereas PG (as is
right) complains. I've offered to clean up their PG insertion code but
they say that they don't want too many extra checks as their app writes
to the DB a *lot* and any extra check is going to slow down the
application noticeably ...


Re: Duplicate key insert question

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
> Sent: Tuesday, July 01, 2003 5:10 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Duplicate key insert question
>
>
> I have a table with a primary field and a few other fields.
> What is the
> fastest way to do an insert into that table assuming that sometimes I
> might try to insert a record with a duplicate primary key and
> want that
> to fail?
>
> I know that if I try a plain insert this will work, but in the case
> where I am trying to insert a duplicate key, the insert fails (as it
> should) and an error is logged.
>
> I could first do a check to see if there is already an entry with the
> same key as the one I am trying to insert but that would mean quite a
> few extra operations.
>
> Is there a quick and clean way of doing something like:
>
> INSERT into table values(...) IF there isn't already a row with pkey=x
>
> Thanks,
>
> Jean-Christian Imbeault
>
> PS The reason I am asking is that an open source project is
> using MySQL
> as their DB and they have a port to PG that isn't very clean
> b/c the DB
> code keeps trying to insert duplicate primary keys. According to them
> MySQL doesn't complain and just drops the insert whereas PG (as is
> right) complains. I've offered to clean up their PG insertion
> code but
> they say that they don't want too many extra checks as their
> app writes
> to the DB a *lot* and any extra check is going to slow down the
> application noticeably ...

SQL*Server has a nice feature for this.  When you create an index, you
can ignore duplicate key attempts (nothing happens, and an informational
level warning is raised).

I find that it is an extremely useful feature for things like creation
of dictionaries or domains.
========================================================================
====
Syntax
CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED]
    INDEX index_name ON table (column [,...n])
[WITH
        [PAD_INDEX]
        [[,] FILLFACTOR = fillfactor]
        [[,] IGNORE_DUP_KEY]
        [[,] DROP_EXISTING]
        [[,] STATISTICS_NORECOMPUTE]
]
[ON filegroup]

IGNORE_DUP_KEY
Controls what happens when an attempt is made to insert a duplicate key
value into a column that is part of a unique clustered index. If
IGNORE_DUP_KEY was specified for the index and an INSERT statement that
creates a duplicate key is executed, SQL Server issues a warning message
and ignores (does not insert) the duplicate row.
If IGNORE_DUP_KEY was not specified for the index, SQL Server issues an
error message and rolls back the entire INSERT statement.

A unique index cannot be created on a column that already includes
duplicate values, whether or not IGNORE_DUP_KEY is set. If attempted,
SQL Server displays an error message and lists the duplicate values.
Eliminate the duplicate values before creating a unique index on the
column.

E. Use the IGNORE_DUP_KEY
This example creates a unique clustered index on the emp_pay table. If a
duplicate key is entered, the INSERT or UPDATE statement is ignored.

SET NOCOUNT ON

USE pubs

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES

        WHERE TABLE_NAME = 'emp_pay')

    DROP TABLE emp_pay

GO

USE pubs

IF EXISTS (SELECT name FROM sysindexes

        WHERE name = 'employeeID_ind')

    DROP INDEX emp_pay.employeeID_ind

GO

USE pubs

GO

CREATE TABLE emp_pay

(

 employeeID int NOT NULL,

 base_pay money NOT NULL,

 commission decimal(2, 2) NOT NULL

)

INSERT emp_pay

    VALUES (1, 500, .10)

INSERT emp_pay

    VALUES (2, 1000, .05)

INSERT emp_pay

    VALUES (3, 800, .07)

INSERT emp_pay

    VALUES (5, 1500, .03)

INSERT emp_pay

    VALUES (9, 750, .06)

GO

SET NOCOUNT OFF

GO

CREATE UNIQUE CLUSTERED INDEX employeeID_ind

    ON emp_pay(employeeID)

    WITH IGNORE_DUP_KEY
========================================================================
====

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Dann Corbit wrote:
>
> SQL*Server has a nice feature for this.

Ok ... is there a way to achieve the same effect in postgres?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
> Sent: Tuesday, July 01, 2003 5:21 PM
> To: Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Duplicate key insert question
>
>
> Dann Corbit wrote:
> >
> > SQL*Server has a nice feature for this.
>
> Ok ... is there a way to achieve the same effect in postgres?

No.

MySQL is broken, unless they have some extension like SQL*Server.  A
duplicate insertion into a unique index must raise an error.

(c)ISO/IEC ISO/IEC 9075-1:1999 (E)
4.6 SQL-schema objects
4.6.6.3 Table constraints
A table constraint is an integrity constraint associated with a single
base table.
A table constraint is either a unique constraint, a primary key
constraint, a referential constraint, or a check constraint.
A unique constraint specifies one or more columns of the table as unique
columns. A unique constraint is satisfied if and only if no two rows in
a table have the same non-null values in the
unique columns.
A primary key constraint is a unique constraint that specifies PRIMARY
KEY. A primary key constraint is satisfied if and only if no two rows in
a table have the same non-null values in the unique columns and none of
the values in the specified column or columns are the null value.
A referential constraint specifies one or more columns as referencing
columns and corresponding referenced columns in some (not necessarily
distinct) base table, referred to as the referenced table.
Such referenced columns are the unique columns of some unique constraint
of the referenced table.
A referential constraint is always satisfied if, for every row in the
referencing table, the values of the referencing columns are equal to
those of the corresponding referenced columns of some row in the
referenced table. If null values are present, however, satisfaction of
the referential constraint depends on the treatment specified for nulls
(known as the match type).
Referential actions may be specified to determine what changes are to be
made to the referencing table if a change to the referenced table would
otherwise cause the referential constraint to be violated.
A table check constraint specifies a search condition. The constraint is
violated if the result of the search condition is false for any row of
the table (but not if it is unknown).

4.7 Integrity constraints and constraint checking
4.7.1 Constraint checking
There are two kinds of schema object that describe constraints:
assertions and table constraints (including domain constraints of any
domains on which columns of that table may be based), and they are
checked in the same way.
Every constraint is either deferrable or not deferrable.
In every SQL-session, every constraint has a constraint mode that is a
property of that SQL-session.
Each constraint has a (persistent) default constraint mode, with which
the constraint starts each SQL-transaction in each SQL-session.
A constraint mode is either deferred or immediate, and can be set by an
SQL-statement, provided the constraint is deferrable.
When a transaction is initiated, the constraint mode of each constraint
is set to its default.
On completion of execution of every SQL-statement, every constraint is
checked whose constraint mode is immediate.
Before termination of a transaction, every constraint mode is set to
immediate (and therefore checked).

Re: Duplicate key insert question

From
"Reuben D. Budiardja"
Date:
On Tuesday 01 July 2003 08:10 pm, Jean-Christian Imbeault wrote:
> I have a table with a primary field and a few other fields. What is the
> fastest way to do an insert into that table assuming that sometimes I
> might try to insert a record with a duplicate primary key and want that
> to fail?
>
> I know that if I try a plain insert this will work, but in the case
> where I am trying to insert a duplicate key, the insert fails (as it
> should) and an error is logged.
>
> I could first do a check to see if there is already an entry with the
> same key as the one I am trying to insert but that would mean quite a
> few extra operations.
>
> Is there a quick and clean way of doing something like:
>
> INSERT into table values(...) IF there isn't already a row with pkey=x
>
> Thanks,
>
> Jean-Christian Imbeault

Hi, not sure if this is answering your question, but I just asked similar
questions here. I asked about using INSERT WHERE NOT EXISTS (which you can do
in PostgreSQL). Here is what you can do:

INSERT INTO mytable
SELECT 'value1', 'value2'
   WHERE NOT EXISTS
        (SELECT NULL FROM mytable
                WHERE mycondition)

This will just return 0 when fails, but it does check first. Don't know if you
can really afford that. Just for reference, this brought up some discussion
here. Here is a link to the archive:
http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b

Hope that helps.
RDB


--
Reuben D. Budiardja
Department of Physics and Astronomy
The University of Tennessee, Knoxville, TN
-------------------------------------------------
/"\  ASCII Ribbon Campaign against HTML
\ /  email and proprietary format
 X   attachments.
/ \
-------------------------------------------------
Have you been used by Microsoft today?
Choose your life. Choose freedom.
Choose LINUX.
-------------------------------------------------


Re: Duplicate key insert question

From
"Maksim Likharev"
Date:
Yes, but for MSSQL unique index with ignore duplicate in reality
will reject all duplicates.
Another word if you are trying to insert 2 identical values
you will insert none.
Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ... IS
NULL.
works pretty fast.


-----Original Message-----
From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
Sent: Tuesday, July 01, 2003 5:21 PM
To: Dann Corbit
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Duplicate key insert question


Dann Corbit wrote:
>
> SQL*Server has a nice feature for this.

Ok ... is there a way to achieve the same effect in postgres?

Thanks,

Jean-Christian Imbeault


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Reuben D. Budiardja wrote:
>
> Hi, not sure if this is answering your question, but I just asked similar
> questions here. I asked about using INSERT WHERE NOT EXISTS (which you can do
> in PostgreSQL). Here is what you can do:
>
> INSERT INTO mytable
> SELECT 'value1', 'value2'
>    WHERE NOT EXISTS
>         (SELECT NULL FROM mytable
>                 WHERE mycondition)
>
> http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b

Thanks for the link!

I read the thread and it looks like even the above solution is not
perfect because of a possible race condition where two inserts trying to
insert a row with a pk not in the table will both get think it is ok to
do so, try it and then both will fail?

If I followed all the arguments correctly according to the thread there
is *no* way to do what I (and you ;) want in one simple query.

Do you agree? Or did I miss something?

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Maksim Likharev [mailto:mlikharev@aurigin.com]
> Sent: Tuesday, July 01, 2003 5:40 PM
> To: Jean-Christian Imbeault; Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: RE: [GENERAL] Duplicate key insert question
>
>
> Yes, but for MSSQL unique index with ignore duplicate in
> reality will reject all duplicates. Another word if you are
> trying to insert 2 identical values
> you will insert none.

Their documentation is wrong then.  From here:
http://www.mysql.com/doc/en/ALTER_TABLE.html we have this:

"IGNORE is a MySQL extension to SQL-92. It controls how ALTER TABLE
works if there are duplicates on unique keys in the new table. If IGNORE
isn't specified, the copy is aborted and rolled back. If IGNORE is
specified, then for rows with duplicates on a unique key, only the first
row is used; the others are deleted."

> Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ...
> WHERE ... IS NULL. works pretty fast.
>
>
> -----Original Message-----
> From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
> Sent: Tuesday, July 01, 2003 5:21 PM
> To: Dann Corbit
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Duplicate key insert question
>
>
> Dann Corbit wrote:
> >
> > SQL*Server has a nice feature for this.
>
> Ok ... is there a way to achieve the same effect in postgres?
>
> Thanks,
>
> Jean-Christian Imbeault
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so
> that your
>       message can get through to the mailing list cleanly
>

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Maksim Likharev wrote:
>
> Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ... IS
> NULL.
> works pretty fast.

Sorry, I don't understand. Works pretty fast for what?

Is that a way of finding if a value exists? or a way of doing the insertion?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
Alvaro Herrera
Date:
On Wed, Jul 02, 2003 at 09:45:23AM +0900, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:

> > http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b
>
> Thanks for the link!
>
> I read the thread and it looks like even the above solution is not
> perfect because of a possible race condition where two inserts trying to
> insert a row with a pk not in the table will both get think it is ok to
> do so, try it and then both will fail?

No, only the "second" one will fail (though it's difficult which one is
the second)

> If I followed all the arguments correctly according to the thread there
> is *no* way to do what I (and you ;) want in one simple query.

No, there's not.  You should check the returned value from the insertion
function to see if it succeeded or not.  Sadly, an error will cause the
whole transaction to abort, but if they come from the MySQL side it will
hardly matter.  But you should try to use a sequence if at all possible
to avoid all these problems.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"A wizard is never late, Frodo Baggins, nor is he early.
He arrives precisely when he means to."  (Gandalf, en LoTR FoTR)

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Alvaro Herrera wrote:
>
> No, only the "second" one will fail (though it's difficult which one is
> the second)

From:

http://marc.theaimsgroup.com/?l=postgresql-general&m=105656988915991&w=2

Ian Barwick wrote:

[...]

I proposed that same solution 3 years ago. Tom shoots it down:

[...]

I couldn't get the link to work so I couldn't read why Tom shot it down.
But if Tom shot down this idea down ... then it mustn't be correct.

>>If I followed all the arguments correctly according to the thread there
>>is *no* way to do what I (and you ;) want in one simple query.
>
>
> No, there's not.

You say no, but at first you say that the proposed method works. The
proposed method, if it is correct, is simple enough for me. By simple I
mean all can be done with one query.

> You should check the returned value from the insertion
> function to see if it succeeded or not.

No, what I want if to have one query that will *always* insert if there
is no record with this primary key and *always* do nothing (not fail,
not generate an error) if there is already a record with this primary
key. I don't want to check return values :)

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Maksim Likharev"
Date:
Finding if the duplicate value exists and inserting if not.

As for the race condition ( your other post )
I do not know how that will work for PG, but in Microsoft SQL Server
you can do following
BEGIN TRANSACTION
UPDATE [val] = [val]
    WHERE ....
INSERT ...
COMMIT TRANSACTION

Pretty general approach tho, should work on any SQL system with
transaction and locking support.

so basically by updating specific row ( let say you have such row )
in transaction, row/page lock will be held until end of transaction
and concurrent UPDATE will wait until you are done.
Kind of semaphore.

Practical example table that holds unique rows, let say documents,
you can have extra row with let say [id] = -1 or whatever you like,
so during insert into that table you can update that row in a
transaction,
search/insert unique values, commit transaction.

-----Original Message-----
From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
Sent: Tuesday, July 01, 2003 5:47 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Duplicate key insert question


Maksim Likharev wrote:
>
> Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ...
IS
> NULL.
> works pretty fast.

Sorry, I don't understand. Works pretty fast for what?

Is that a way of finding if a value exists? or a way of doing the
insertion?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Maksim Likharev wrote:
>
> Finding if the duplicate value exists and inserting if not.

Ok, thanks but I think it is still vulnerable to a race condition.

> I do not know how that will work for PG, but in Microsoft SQL Server
> you can do following
> BEGIN TRANSACTION
> UPDATE [val] = [val]
>     WHERE ....
> INSERT ...
> COMMIT TRANSACTION
>
> so basically by updating specific row ( let say you have such row )
> in transaction, row/page lock will be held until end of transaction
> and concurrent UPDATE will wait until you are done.
> Kind of semaphore.

Why the UPDATE? And in postgres every query runs in it's own transaction
so no need for the explicit BEGIN / END block.

So can't see how your solution is any better than the previous one :)

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
Alvaro Herrera
Date:
On Wed, Jul 02, 2003 at 09:58:28AM +0900, Jean-Christian Imbeault wrote:
> Alvaro Herrera wrote:
> >
> > No, only the "second" one will fail (though it's difficult which one is
> > the second)
>
> I couldn't get the link to work so I couldn't read why Tom shot it down.
> But if Tom shot down this idea down ... then it mustn't be correct.

The thread is here:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3A4D6116.1A613402%40mascari.com&rnum=1&prev=/groups%3Fq%3DMike%2BMascari%2BINSERT%2BNOT%2BEXISTS%26ie%3D

The solution is not correct in that there _is_ a race condition.

> > You should check the returned value from the insertion
> > function to see if it succeeded or not.
>
> No, what I want if to have one query that will *always* insert if there
> is no record with this primary key and *always* do nothing (not fail,
> not generate an error) if there is already a record with this primary
> key. I don't want to check return values :)

No way.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No es bueno caminar con un hombre muerto"

Re: Duplicate key insert question

From
"Reuben D. Budiardja"
Date:
On Tuesday 01 July 2003 08:45 pm, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:
> > Hi, not sure if this is answering your question, but I just asked similar
> > questions here. I asked about using INSERT WHERE NOT EXISTS (which you
> > can do in PostgreSQL). Here is what you can do:
> >
> > INSERT INTO mytable
> > SELECT 'value1', 'value2'
> >    WHERE NOT EXISTS
> >         (SELECT NULL FROM mytable
> >                 WHERE mycondition)
> >
> > http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EX
> >ISTS&q=b
>
> Thanks for the link!
>
> I read the thread and it looks like even the above solution is not
> perfect because of a possible race condition where two inserts trying to
> insert a row with a pk not in the table will both get think it is ok to
> do so, try it and then both will fail?

No, onlu *one* of them will fail, but yes, the other will then generate error.
So it really is a trade off. Another way would be to lock the table, as other
has suggested. But then there is disadvantages to that also.

RDB


--
Reuben D. Budiardja
Department of Physics and Astronomy
The University of Tennessee, Knoxville, TN
-------------------------------------------------
/"\  ASCII Ribbon Campaign against HTML
\ /  email and proprietary format
 X   attachments.
/ \
-------------------------------------------------
Have you been used by Microsoft today?
Choose your life. Choose freedom.
Choose LINUX.
-------------------------------------------------


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Alvaro Herrera wrote:
>
> The thread is here:
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3A4D6116.1A613402%40mascari.com&rnum=1&prev=/groups%3Fq%3DMike%2BMascari%2BINSERT%2BNOT%2BEXISTS%26ie%3D

Thanks!

> The solution is not correct in that there _is_ a race condition.

I thought so :(

>>No, what I want if to have one query that will *always* insert if there
>>is no record with this primary key and *always* do nothing (not fail,
>>not generate an error) if there is already a record with this primary
>>key. I don't want to check return values :)
>
>
> No way.

I was beginning to think so. Thanks for confirming my suspicions.

In your opinion what is the best solution, if we define best as not
generating any error messages and executing as quickly as possible?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Dann Corbit"
Date:
>>  -----Original Message-----
>> From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
>>  Sent: Tuesday, July 01, 2003 5:51 PM
>>  To: Jean-Christian Imbeault
>>  Cc: techlist@voyager.phys.utk.edu; pgsql-general@postgresql.org
>>  Subject: Re: [GENERAL] Duplicate key insert question
>>
>>
>>  On Wed, Jul 02, 2003 at 09:45:23AM +0900, Jean-Christian
>>  Imbeault wrote:
>> Reuben D. Budiardja wrote:
[snip]
>> If I followed all the arguments correctly according to the thread
>> there is *no* way to do what I (and you ;) want in one simple query.
>
> No, there's not.  You should check the returned value from the
insertion
> function to see if it succeeded or not.  Sadly, an error will cause
the
> whole transaction to abort, but if they come from the MySQL side it
will
> hardly matter.  But you should try to use a sequence if at all
possible
> to avoid all these problems.

Does not really avoid the named issue.

Suppose that you have a dictionary of working part numbers (e.g. Boeing
might have 3 million distinct parts in their database).
They would like to create a domain for these parts.  So, naturally, they
take their list and do
%cat list.dat|sort|uniq>list.sor
And then bulk load list.sor.

Unfortunately, the operation fails, because one part was duplicated:

PartID    PartDescription
--------  ---------------------------------
94v-975b  High speed saphire needle bearing
94V-975B  High speed saphire needle bearing

It would have been nice if after loading 1.7 million of the 3 million
parts, it could simply skip over the obvious error instead of rolling
everything back.

Of course, it is also possible that 94v-975b and 94V-975B are distinct
parts.  So the one who designs the database must make that decision in
allowing an IGNORE option.

I think it would be a useful addition to PostgreSQL, but I have an easy
work around for what I want to do by simply capitalizing the strings I
am inserting into a dictionary or domain and use select distinct to
filter.  The rare times I want to do something like that incrementally,
I can just request a table lock.


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Reuben D. Budiardja wrote:
>
> No, onlu *one* of them will fail, but yes, the other will then generate error.
> So it really is a trade off. Another way would be to lock the table, as other
> has suggested. But then there is disadvantages to that also.

Really? I just got a post form Alvaro Herrera saying;

"The solution is not correct in that there _is_ a race condition."

Maybe I misunderstood, but "not correct" doesn't sound good :)

Jean-Christian Imbeault


Re: Duplicate key insert question

From
Alvaro Herrera
Date:
On Wed, Jul 02, 2003 at 10:25:54AM +0900, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:
> >
> > No, onlu *one* of them will fail, but yes, the other will then generate error.
> > So it really is a trade off. Another way would be to lock the table, as other
> > has suggested. But then there is disadvantages to that also.
>
> Really? I just got a post form Alvaro Herrera saying;
>
> "The solution is not correct in that there _is_ a race condition."
>
> Maybe I misunderstood, but "not correct" doesn't sound good :)

Well, he is right.  One will fail, the other will not.  The race
condition is for the application.  If you want to ignore it, you can do
that, but there _will_ be an ERROR thrown and the transaction will be
aborted.  The other transaction _will_ insert the tuple, though, and it
won't be aborted.

Note that for the race condition to show there has to be a race, i.e.
two backends trying to insert the same primary key at the same time.  If
one finishes half a second before the other, they will behave that way
you want, i.e. there will one tuple inserted and no error generated.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Re: Duplicate key insert question

From
Alvaro Herrera
Date:
On Tue, Jul 01, 2003 at 06:26:08PM -0700, Dann Corbit wrote:

> > But you should try to use a sequence if at all possible to avoid all
> > these problems.
>
> Does not really avoid the named issue.
>
> Suppose that you have a dictionary of working part numbers (e.g. Boeing
> might have 3 million distinct parts in their database).
> They would like to create a domain for these parts.  So, naturally, they
> take their list and do
> %cat list.dat|sort|uniq>list.sor
> And then bulk load list.sor.

Oh, sure.  The sequence thing won't apply everywhere.  But maybe it can
be applied in his scenario, which I don't know.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Alvaro Herrera wrote:
>
> Well, he is right.  One will fail, the other will not.  The race
> condition is for the application.  If you want to ignore it, you can do
> that, but there _will_ be an ERROR thrown and the transaction will be
> aborted.

Ah ... then maybe this solution is 'good enough'. It will still generate
 an error message some of the time (when there is a race condition) but
will definitely generate fewer error messages than the current method
used which is just to do the insert and let it fail if there is already
a record with the same primary key.

Thanks for the help!

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl]
> Sent: Tuesday, July 01, 2003 6:37 PM
> To: Jean-Christian Imbeault
> Cc: techlist@voyager.phys.utk.edu; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Duplicate key insert question
>
>
> On Wed, Jul 02, 2003 at 10:25:54AM +0900, Jean-Christian
> Imbeault wrote:
> > Reuben D. Budiardja wrote:
> > >
> > > No, onlu *one* of them will fail, but yes, the other will then
> > > generate error.
> > > So it really is a trade off. Another way would be to lock
> the table, as other
> > > has suggested. But then there is disadvantages to that also.
> >
> > Really? I just got a post form Alvaro Herrera saying;
> >
> > "The solution is not correct in that there _is_ a race condition."
> >
> > Maybe I misunderstood, but "not correct" doesn't sound good :)
>
> Well, he is right.  One will fail, the other will not.  The
> race condition is for the application.  If you want to ignore
> it, you can do that, but there _will_ be an ERROR thrown and
> the transaction will be aborted.  The other transaction
> _will_ insert the tuple, though, and it won't be aborted.
>
> Note that for the race condition to show there has to be a
> race, i.e. two backends trying to insert the same primary key
> at the same time.  If one finishes half a second before the
> other, they will behave that way you want, i.e. there will
> one tuple inserted and no error generated.

I assume that PostgreSQL would simply time out both transactions if it
happened in a deadly-embrace pair?

I searched the PG docs, but could not find a clear answer.

Re: Duplicate key insert question

From
Alvaro Herrera
Date:
On Tue, Jul 01, 2003 at 06:48:29PM -0700, Dann Corbit wrote:

> I assume that PostgreSQL would simply time out both transactions if it
> happened in a deadly-embrace pair?
>
> I searched the PG docs, but could not find a clear answer.

No, the deadlock will the detected and one of the transactions will be
aborted.  This should happen within a second or so (configurable,
AFAIR).  The other transaction will continue normally.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")

Re: Duplicate key insert question

From
"Reuben D. Budiardja"
Date:
On Tuesday 01 July 2003 09:25 pm, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:
> > No, onlu *one* of them will fail, but yes, the other will then generate
> > error. So it really is a trade off. Another way would be to lock the
> > table, as other has suggested. But then there is disadvantages to that
> > also.
>
> Really? I just got a post form Alvaro Herrera saying;
>
> "The solution is not correct in that there _is_ a race condition."
>
> Maybe I misunderstood, but "not correct" doesn't sound good :)
>
If you want to avoid the race condition as well, then use the locking
mechanism for transaction. Combine it with the previous INSERT ... SELECT ...
WHERE NOT EXISTS, it should give you what you want. I suspect it's slower
though. Eg:

BEGIN WORK;

INSERT INTO mytable
SELECT 'value1', 'value2'
   WHERE NOT EXISTS
                (SELECT NULL FROM mytable
                    WHERE mycondition)

COMMIT WORK;

This should solve the Race Condition, since other transaction have to wait.
But if the PK already exists, this will quit without error.

RDB

Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Reuben D. Budiardja wrote:
>
> INSERT INTO mytable
> SELECT 'value1', 'value2'
>    WHERE NOT EXISTS
>         (SELECT NULL FROM mytable
>                 WHERE mycondition)

Thank you to everyone who helped out on my question. I am trying to
implement the above solution but I'm having problems getting this to
work when I want to insert more than one value:

TAL=# create table b (a text primary key, b text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'b_pkey'
for table 'b'
CREATE TABLE
TAL=# insert into b select 'a';
INSERT 335311 1
TAL=# insert into b select 'b', select 'b';
ERROR:  parser: parse error at or near "select" at character 27


Did I get the syntax wrong?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Reuben D. Budiardja"
Date:
On Tuesday 01 July 2003 11:08 pm, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:
> > INSERT INTO mytable
> > SELECT 'value1', 'value2'
> >    WHERE NOT EXISTS
> >         (SELECT NULL FROM mytable
> >                 WHERE mycondition)
>
> Thank you to everyone who helped out on my question. I am trying to
> implement the above solution but I'm having problems getting this to
> work when I want to insert more than one value:
>
> TAL=# create table b (a text primary key, b text);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 'b_pkey'
> for table 'b'
> CREATE TABLE
> TAL=# insert into b select 'a';
> INSERT 335311 1
> TAL=# insert into b select 'b', select 'b';
> ERROR:  parser: parse error at or near "select" at character 27
>

I don't see what you're trying to do. Why do you have two select ?

RDB


Re: Duplicate key insert question

From
Bruno Wolff III
Date:
On Wed, Jul 02, 2003 at 12:08:56 +0900,
  Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote:
> TAL=# insert into b select 'b', select 'b';
> ERROR:  parser: parse error at or near "select" at character 27

You probably want:
insert into b select 'b', 'b';

Re: Duplicate key insert question

From
Ian Barwick
Date:
On Wednesday 02 July 2003 02:58, Jean-Christian Imbeault wrote:
> Alvaro Herrera wrote:
> > No, only the "second" one will fail (though it's difficult which one is
> > the second)
>
> From:
>
> http://marc.theaimsgroup.com/?l=postgresql-general&m=105656988915991&w=2
>
> Ian Barwick wrote:
>
> [...]
>
> I proposed that same solution 3 years ago. Tom shoots it down:

(This quote is not from Mike Mascari, not me)

> [...]
>
> I couldn't get the link to work so I couldn't read why Tom shot it down.
> But if Tom shot down this idea down ... then it mustn't be correct.

see:
http://archives.postgresql.org/pgsql-general/2000-12/msg00970.php

entire thread:
http://archives.postgresql.org/pgsql-general/2000-12/msg00947.php


Ian Barwick
barwick@gmx.net


Re: Duplicate key insert question

From
Mike Mascari
Date:
Jean-Christian Imbeault wrote:
> Maksim Likharev wrote:
>
>>Finding if the duplicate value exists and inserting if not.
>
> Ok, thanks but I think it is still vulnerable to a race condition.
>
>>I do not know how that will work for PG, but in Microsoft SQL Server
>>you can do following
>>BEGIN TRANSACTION
>>UPDATE [val] = [val]
>>    WHERE ....
>>INSERT ...
>>COMMIT TRANSACTION
>>
>>so basically by updating specific row ( let say you have such row )
>>in transaction, row/page lock will be held until end of transaction
>>and concurrent UPDATE will wait until you are done.
>>Kind of semaphore.
>
> Why the UPDATE? And in postgres every query runs in it's own transaction
> so no need for the explicit BEGIN / END block.

The transaction semantics of your application should dictate the use
of BEGIN/END. If you aren't using BEGIN/END (as many non-InnoDB MySQL
applications fail to do), I doubt you can guarantee logical
consistency and performance will be less than what it should be. With
WAL, the difference is less noticeable, but it is there. The use of
BEGIN/END to guarantee logical consistency will help performance. You
kill two birds with one stone.

> So can't see how your solution is any better than the previous one :)

The solution works because:

Backend #1:

BEGIN;
UPDATE foo SET bar = bar WHERE keyid = 1;

Backend #2:

BEGIN;
UPDATE foo SET bar = bar WHERE keyid = 1; <-- Now blocks because of #1

Backend #1:

INSERT INTO weirdtable
SELECT weirdkey, field1, field2
WHERE NOT EXISTS (
 SELECT 1 FROM weirdtable w
 WHERE w.key = weirdkey
); <-- Performs the insert (or not)

COMMIT;

Backend #2:

INSERT INTO weirdtable
SELECT weirdkey, field1, field2
WHERE NOT EXISTS (
 SELECT 1 FROM weirdtable w
 WHERE w.key = weirdkey
); <-- Does nothing, as INSERT was (possibly) caused by Backend #1

COMMIT;

The UPDATE acts as a lock with row granularity, preventing the race
condition caused by multiple INSERT..WHERE NOT EXISTS. Tuple
visibility rules in READ COMMITTED will allow Backend #2 to see the
new INSERT performed by Backend #1, since Backend #2 won't continue
past the UPDATE until Backend #1 has issued a COMMIT or ABORT.
However, the solution prohibits you from using SERIALIZABLE, since the
UPDATE of the same lock row will cause a serialization error.

If READ COMMITTED is sufficient, the question arises as to how to get
row granularity from the lock. One could have a lock table composed of
the keys being inserted but how do the keys get into the lock table
without a race condition there? If row-lock granularity cannot be
achieved, and you absolutely cannot handle a race condition causing a
unique key violation, you might as well just use LOCK TABLE. You might
also want to investigate the userlocks in /contrib, although I haven't
used them so I cannot speak to their usefulness.

Mike Mascari
mascarm@mascari.com



Re: Duplicate key insert question

From
"Nigel J. Andrews"
Date:
On Wed, 2 Jul 2003, Jean-Christian Imbeault wrote:

> I have a table with a primary field and a few other fields. What is the
> fastest way to do an insert into that table assuming that sometimes I
> might try to insert a record with a duplicate primary key and want that
> to fail?
>
> I know that if I try a plain insert this will work, but in the case
> where I am trying to insert a duplicate key, the insert fails (as it
> should) and an error is logged.
>
> I could first do a check to see if there is already an entry with the
> same key as the one I am trying to insert but that would mean quite a
> few extra operations.
>
> Is there a quick and clean way of doing something like:
>
> INSERT into table values(...) IF there isn't already a row with pkey=x
>
> Thanks,
>
> Jean-Christian Imbeault
>
> PS The reason I am asking is that an open source project is using MySQL
> as their DB and they have a port to PG that isn't very clean b/c the DB
> code keeps trying to insert duplicate primary keys. According to them
> MySQL doesn't complain and just drops the insert whereas PG (as is
> right) complains. I've offered to clean up their PG insertion code but
> they say that they don't want too many extra checks as their app writes
> to the DB a *lot* and any extra check is going to slow down the
> application noticeably ...

Skipping ahead without reading all the messages in this thread (got to rush
out) what about using a before insert trigger, doing the check in there,
returning null if the insert would fail and see if they complain about the slow
down :)

Of course it's still got the race condition for the application unless you also
lock the table and it'll screw up any use of currval(sequence) afterwards that
expects to get the id of the row inserted with a id obtained from
nextval(sequence)

Nigel Andrews



Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Nigel J. Andrews wrote:
>
> Skipping ahead without reading all the messages in this thread (got to rush
> out) what about using a before insert trigger, doing the check in there,
> returning null if the insert would fail and see if they complain about the slow
> down :)

Oooh! I think I like that .... there would be no penalty for the MySQL
version since this would be a trigger and only in the postgres database
version of the code.

Will a trigger still allow one insert to succeed if there are multiple
backends trying to insert the same primary key in a table? There must be
no 'lost' inserts ....


> Of course it's still got the race condition for the application unless you also
> lock the table and it'll screw up any use of currval(sequence) afterwards that
> expects to get the id of the row inserted with a id obtained from
> nextval(sequence)

I just want two things from any valid solution:

1- if there is an insert and there is not row with the new insert's
primary key then the insert is made. If there are multiple inserts one
succeeds. i.e. No valid inserts will be 'lost'.

2- reduce the number of error messages logged as a result of
'collisions' between two backends trying to insert duplicate primary key
rows.

I am not very familiar with triggers so I would very much appreciate any
feedback. But I think the use of a trigger might safe?

Thanks,

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Shridhar Daithankar"
Date:
On 2 Jul 2003 at 16:19, Jean-Christian Imbeault wrote:
> I just want two things from any valid solution:
>
> 1- if there is an insert and there is not row with the new insert's
> primary key then the insert is made. If there are multiple inserts one
> succeeds. i.e. No valid inserts will be 'lost'.

With postgresql, if you use transacations, no valid inserts are ever lost
unless there is a deadlock.
>
> 2- reduce the number of error messages logged as a result of
> 'collisions' between two backends trying to insert duplicate primary key
> rows.

Log of what? Postgresql will log all the errors. If you don't want to see them,
tune the logging option.

If your code is doing any checking for return value etc., check if it is a
duplicate message and discard it. That way you can minimize your application
logging.

Otherwise use select/insert behaviour referred earlier.

Besides if you are going to insert  a duplicate rarely, why worry so much about
performance? And if your dulicates are bit too frequent for comfort, you might
have some more issues w.r.t database table design to look at.

Bye
 Shridhar

--
And 1.1.81 is officially BugFree(tm), so if you receive any bug-reportson it,
you know they are just evil lies."(By Linus Torvalds,
Linus.Torvalds@cs.helsinki.fi)


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Shridhar Daithankar wrote:

Again, please let me preface my email. The application in question is
not written by me. I am trying to find a simple and easily implementable
way of making the current implementation more postgres friendly :)

> With postgresql, if you use transacations, no valid inserts are ever lost
> unless there is a deadlock.

Just needed to make sure as I don't know much about triggers.

>>2- reduce the number of error messages logged as a result of
>>'collisions' between two backends trying to insert duplicate primary key
>>rows.
>
> Log of what? Postgresql will log all the errors. If you don't want to see them,
> tune the logging option.

I don't want to turn all error logging off, actually I don't want to
turn any error logging off :) If errors are logged it usually means
there is a bug some somewhere ...

> If your code is doing any checking for return value etc., check if it is a
> duplicate message and discard it. That way you can minimize your application
> logging.

It's not my application logging, it's postgres logging the fact that a
duplicate insertion was attempted:

ERROR:  Cannot insert a duplicate key into unique index pot_documents_pkey

> Otherwise use select/insert behaviour referred earlier.

But the trigger is nicer since none of the application code would need
to be changed at all. I would just add a trigger to the database
creation script.

> And if your dulicates are bit too frequent for comfort, you might
> have some more issues w.r.t database table design to look at.

They are frequent, and I agree that there are issues with the database
design. But the application developers don't want to address them (if
it's broke why fix it kind of thing) ...

I just noticed that their app was generating a *lot* of log entries when
used with a postgres database and I offered to try and find a way of
reducing the number of errors generated.

Just trying to find a way to make *their* app play nice with postgres so
that people who do use their app will use it with postgres as a DB and
not that *other* DB ...

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Shridhar Daithankar"
Date:
On 2 Jul 2003 at 16:52, Jean-Christian Imbeault wrote:
> They are frequent, and I agree that there are issues with the database
> design. But the application developers don't want to address them (if
> it's broke why fix it kind of thing) ...
>
> I just noticed that their app was generating a *lot* of log entries when
> used with a postgres database and I offered to try and find a way of
> reducing the number of errors generated.
>
> Just trying to find a way to make *their* app play nice with postgres so
> that people who do use their app will use it with postgres as a DB and
> not that *other* DB ...

Your error message is

ERROR:  Cannot insert a duplicate key into unique index pot_documents_pkey

Just restart the postmaster and redirect log to script which ignores this
message. A simple pipe to grep -v would do the trick. No need to modify the
app, no need to nodify the db as well.

HTH

Bye
 Shridhar

--
COBOL:    An exercise in Artificial Inelegance.


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Shridhar Daithankar wrote:
>
> Just restart the postmaster and redirect log to script which ignores this
> message. A simple pipe to grep -v would do the trick. No need to
modify the
> app, no need to nodify the db as well.

True but each and every user would need to redirect their log, pipe to a
filter, etc ...

With a a trigger it's built into the app. You install the app and that's
it. It's a lot more user friendly, no?

Jean-Christian Imbeault


Re: Duplicate key insert question

From
"Shridhar Daithankar"
Date:
On 2 Jul 2003 at 17:09, Jean-Christian Imbeault wrote:

> Shridhar Daithankar wrote:
> >
> > Just restart the postmaster and redirect log to script which ignores this
> > message. A simple pipe to grep -v would do the trick. No need to
> modify the
> > app, no need to nodify the db as well.
>
> True but each and every user would need to redirect their log, pipe to a
> filter, etc ...

No. It's just your server log.

> With a a trigger it's built into the app. You install the app and that's
> it. It's a lot more user friendly, no?

Triggers are not built into app. They are built in database so every app. would
automatically get them without restarting database or application.

Bye
 Shridhar

--
Vulcans never bluff.        -- Spock, "The Doomsday Machine", stardate 4202.1


Re: Duplicate key insert question

From
Jean-Christian Imbeault
Date:
Shridhar Daithankar wrote:
>
> No. It's just your server log.

I was assuming that any users of this app who want it using postgres as
a database would want to stop these error messages from being logged.
(The developers know that these error messages are being logged, but
they don't see them as real errors)

> Triggers are not built into app. They are built in database so every
app. would
> automatically get them without restarting database or application.

Sorry I forgot to mention that the app comes with a script to built the
necessary tables, etc. So I would add to that script to create the
necessary trigger.

Jc


Re: Duplicate key insert question

From
"Shridhar Daithankar"
Date:
On 2 Jul 2003 at 17:20, Jean-Christian Imbeault wrote:

> Shridhar Daithankar wrote:
> >
> > No. It's just your server log.
>
> I was assuming that any users of this app who want it using postgres as
> a database would want to stop these error messages from being logged.
> (The developers know that these error messages are being logged, but
> they don't see them as real errors)

Well, with grep -v in log pipe, you are just preventing these messages from
reaching log file. If a client does error checking. He would get the message
anyway.

Bye
 Shridhar

--
One-Shot Case Study, n.:    The scientific equivalent of the four-leaf clover, from which it is    concluded all
cloverspossess four leaves and are sometimes green. 


Re: Duplicate key insert question

From
"Nigel J. Andrews"
Date:
On Wed, 2 Jul 2003, Shridhar Daithankar wrote:

> On 2 Jul 2003 at 17:20, Jean-Christian Imbeault wrote:
>
> > Shridhar Daithankar wrote:
> > >
> > > No. It's just your server log.
> >
> > I was assuming that any users of this app who want it using postgres as
> > a database would want to stop these error messages from being logged.
> > (The developers know that these error messages are being logged, but
> > they don't see them as real errors)
>
> Well, with grep -v in log pipe, you are just preventing these messages from
> reaching log file. If a client does error checking. He would get the message
> anyway.


You've also got the issue of the same pgsql cluster being used for more than
one database and the potential for filtering out ERROR messages from other dbs
in the cluster that really shouldn't be filtered out.


--
Nigel J. Andrews


Re: Duplicate key insert question

From
"Shridhar Daithankar"
Date:
On 2 Jul 2003 at 9:30, Nigel J. Andrews wrote:

> > Well, with grep -v in log pipe, you are just preventing these messages from
> > reaching log file. If a client does error checking. He would get the message
> > anyway.
> You've also got the issue of the same pgsql cluster being used for more than
> one database and the potential for filtering out ERROR messages from other dbs
> in the cluster that really shouldn't be filtered out.


Well, if you put the entire error message in grep string, that should not
affect much as it contains index name. Of course, unless you have same index in
two different databases. Tough luck then..

Bye
 Shridhar

--
Fun experiments:    Get a can of shaving cream, throw it in a freezer for about a
week.    Then take it out, peel the metal off and put it where you want...
bedroom, car, etc.  As it thaws, it expands an unbelievable amount.