Thread: INSERT only unique records
Hey guys, I'm learning Postgresql and trying to inserts only new entries in a table and skips duplicates. I know this is a recurrentquestion, but maybe you could point out what I'm doing wrong in this specific case. Initially, I have two tables: from_t and to_t (empty); mydb=> select * from from_t; num ----- 1 1 2 3 4 5 5 5 (8 rows) The basic inserts I'm trying to make is the following: mydb=> insert into to_t (select num from from_t where num<4); AND mydb=> insert into to_t (select num from from_t where num>2); To avoid duplicates, I had the following ideas: 1. put a unique constraint on num in to_t -> problem: the first violation breaks the subquery and the remaining records are never inserted - I don't know how to catchthe error in subqueries 2. create the following insert rule: create or replace rule to_t_insert as on insert to to_t where exists (select num from to_t where num=NEW.num) do insteadnothing; -> problem below - apparently, the rule checks for unique records, but the check is not iterated as new entries are inserted.3 is not inserted twice, but 1 and 5 are. mydb=> insert into to_t (select num from from_t where num<4); INSERT 0 4 mydb=> select * from to_t; num ----- 1 1 2 3 (4 rows) mydb=> insert into to_t (select num from from_t where num>2); INSERT 0 4 mydb=> select * from to_t; num ----- 1 1 2 3 4 5 5 5 (8 rows) Could you point me to a solution? Thanks, Mark
On Fri, 2009-07-10 at 13:32 -0700, Mark Felegyhazi wrote: > 1. put a unique constraint on num in to_t > -> problem: the first violation breaks the subquery and the remaining records are never inserted - I don't know how tocatch the error in subqueries > > 2. create the following insert rule: 3. Use a BEFORE INSERT OR UPDATE ... FOR EACH ROW trigger to test for a duplicate row and return NULL (making the query a no-op) if so. Your trigger will need to obtain a LOCK TABLE ... IN EXCLUSIVE MODE lock on the table to prevent concurrent inserts resulting in duplicates. Because the INSERT / UPDATE on the table will've already acquired a lesser lock, your trigger will be attempting a lock upgrade, which has a pretty strong chance of resulting in a deadlock if you have concurrent inserts on the table. To avoid this, make sure your transactions obtain an EXCLUSIVE lock on the table before attempting the insert. Failure to do so won't risk data integrity, but may result in an automatic transaction rollback due to deadlock if concurrent insert/update queries are in progress. Make sure you have a unique constraint in place. It'll help the planner out, and will catch mistakes. This approach is nasty in an environment where concurrent inserts are common. 4. Rely on the unique constraint, and do your inserts one-per-statement with something like: INSERT INTO to_t (num) SELECT 4 WHERE NOT EXISTS (SELECT 1 FROM to_t AS tt WHERE tt.num = 4) 5. Don't worry about the duplicates. Let them be inserted, and weed them out later or use a view with a GROUP BY to pick distinct rows. 6. Do your inserts via a PL/PgSQL function that sets a savepoint before each insert and rolls back to the savepoint if the unique constraint generates a unique violation exception. -- Craig Ringer
On Fri, Jul 10, 2009 at 01:32:40PM -0700, Mark Felegyhazi wrote: > I'm learning Postgresql and trying to inserts only new entries in a > table and skips duplicates. I know this is a recurrent question, but > maybe you could point out what I'm doing wrong in this specific case. > To avoid duplicates, I had the following ideas: > > 1. put a unique constraint on num in to_t Constraints are just there to let you know when you're doing something that would break the expectations of other bits of your code. They don't know what to do when these constraints are broken, they just keep the database in a consistent state so that other code has a chance to do "the right thing". > Could you point me to a solution? The simplest would just be to rewrite your queries as something like: INSERT INTO to_t (num) SELECT DISTINCT num FROM from_t f LEFT JOIN to_t t ON f.num = t.num WHERE f.num > 2 AND t.num IS NULL; The SELECT DISTINCT part tells the database to only return distinct values from the query. The LEFT JOIN tells the database to filter out anything that already exists in the "to_t" table. -- Sam http://samason.me.uk/