Thread: INSERT only unique records

INSERT only unique records

From

Mark Felegyhazi

Date:

11 July 2009, 10:55:59

Hey guys,

I'm learning Postgresql and trying to inserts only new entries in a table and skips duplicates. I know this is a
recurrentquestion, but maybe you could point out what I'm doing wrong in this specific case.  

Initially, I have two tables: from_t and to_t (empty);

mydb=> select * from from_t;
 num
-----
   1
   1
   2
   3
   4
   5
   5
   5
(8 rows)

The basic inserts I'm trying to make is the following:

mydb=> insert into to_t (select num from from_t where num<4);
AND
mydb=> insert into to_t (select num from from_t where num>2);

To avoid duplicates, I had the following ideas:

1. put a unique constraint on num in to_t
-> problem: the first violation breaks the subquery and the remaining records are never inserted - I don't know how to
catchthe error in subqueries 

2. create the following insert rule:

create or replace rule to_t_insert as on insert to to_t where exists (select num from to_t where num=NEW.num) do
insteadnothing; 
-> problem below - apparently, the rule checks for unique records, but the check is not iterated as new entries are
inserted.3 is not inserted twice, but 1 and 5 are. 

mydb=> insert into to_t (select num from from_t where num<4);
INSERT 0 4
mydb=> select * from to_t;
 num
-----
   1
   1
   2
   3
(4 rows)
mydb=> insert into to_t (select num from from_t where num>2);
INSERT 0 4
mydb=> select * from to_t;
 num
-----
   1
   1
   2
   3
   4
   5
   5
   5
(8 rows)

Could you point me to a solution?

Thanks,
Mark

Re: INSERT only unique records

From

Craig Ringer

Date:

12 July 2009, 05:28:34

On Fri, 2009-07-10 at 13:32 -0700, Mark Felegyhazi wrote:

> 1. put a unique constraint on num in to_t
> -> problem: the first violation breaks the subquery and the remaining records are never inserted - I don't know how
tocatch the error in subqueries 
>
> 2. create the following insert rule:

3. Use a BEFORE INSERT OR UPDATE ... FOR EACH ROW trigger to test for a
duplicate row and return NULL (making the query a no-op) if so.

Your trigger will need to obtain a LOCK TABLE ... IN EXCLUSIVE MODE lock
on the table to prevent concurrent inserts resulting in duplicates.
Because the INSERT / UPDATE on the table will've already acquired a
lesser lock, your trigger will be attempting a lock upgrade, which has a
pretty strong chance of resulting in a deadlock if you have concurrent
inserts on the table. To avoid this, make sure your transactions obtain
an EXCLUSIVE lock on the table before attempting the insert. Failure to
do so won't risk data integrity, but may result in an automatic
transaction rollback due to deadlock if concurrent insert/update queries
are in progress.

Make sure you have a unique constraint in place. It'll help the planner
out, and will catch mistakes.

This approach is nasty in an environment where concurrent inserts are
common.

4. Rely on the unique constraint, and do your inserts one-per-statement
with something like:

INSERT INTO to_t (num)
SELECT 4 WHERE NOT EXISTS (SELECT 1 FROM to_t AS tt WHERE tt.num = 4)

5. Don't worry about the duplicates. Let them be inserted, and weed them
out later or use a view with a GROUP BY to pick distinct rows.

6. Do your inserts via a PL/PgSQL function that sets a savepoint before
each insert and rolls back to the savepoint if the unique constraint
generates a unique violation exception.

--
Craig Ringer

Re: INSERT only unique records

From

Sam Mason

Date:

12 July 2009, 08:36:32

On Fri, Jul 10, 2009 at 01:32:40PM -0700, Mark Felegyhazi wrote:
> I'm learning Postgresql and trying to inserts only new entries in a
> table and skips duplicates. I know this is a recurrent question, but
> maybe you could point out what I'm doing wrong in this specific case.

> To avoid duplicates, I had the following ideas:
>
> 1. put a unique constraint on num in to_t

Constraints are just there to let you know when you're doing something
that would break the expectations of other bits of your code.  They
don't know what to do when these constraints are broken, they just keep
the database in a consistent state so that other code has a chance to do
"the right thing".

> Could you point me to a solution?

The simplest would just be to rewrite your queries as something like:

  INSERT INTO to_t (num)
    SELECT DISTINCT num
    FROM from_t f LEFT JOIN to_t t ON f.num = t.num
    WHERE f.num > 2
      AND t.num IS NULL;

The SELECT DISTINCT part tells the database to only return distinct
values from the query.  The LEFT JOIN tells the database to filter out
anything that already exists in the "to_t" table.

--
  Sam  http://samason.me.uk/