Thread: Inserting into table only if the row does not already exist.
Hey folks, I am inserting data into a table with a three-column primary key (the table is only three columns). The rows I'm trying to insert may very well be duplicates of ones already in the table, so I would like to have PostgreSQL handle the insert and possible error resulting from dup data. I can certainly do a SELECT first and then INSERT if it's not a duplicate, ala: * Do a SELECT against the three columns * If there are no rows returned, then do the INSERT But I thought this would be possible with the following: INSERT INTO table ( column1, column2, column3 ) SELECT column1, column2, column3 WHERE NOT EXISTS ( SELECT column1,column2, column3 FROM table WHERE column1 = $column1 AND column2 = $column2 AND column3 = $column3) .. which gave me 'ERROR: column1 does not exist'. Nuts. Is this possible to do in one statement? I checked with a DBA friend (he's an Oracle guy), Google, and the list archives, and either didn't find anything helpful or simply didn't try the correct search terms. Or should I be doing this sort of thing in two separate queries? Thanks for all the help you folks have given me, Benny -- "Even if a man chops off your hand with a sword, you still have two nice, sharp bones to stick in his eyes." -- .sig on Slashdot
"C. Bensend" <benny@bennyvision.com> writes: > INSERT INTO table ( column1, column2, column3 ) > SELECT column1, column2, column3 > WHERE NOT EXISTS ( > SELECT column1, column2, column3 FROM table WHERE > column1 = $column1 AND > column2 = $column2 AND > column3 = $column3 ) > > .. which gave me 'ERROR: column1 does not exist'. Nuts. Well you're not selecting from any table so "column1" isn't going to exist. You just have to put it in the select list as a constant. If you're feeling generous to the next programmer to read it you could put "AS column1" after each one, but the column name doesn't actually have to match the column you're inserting into. INSERT INTO table ( column1, column2, column3 ) ( SELECT $column1, $column2, $column3 WHERE NOT EXISTS ( SELECT1 FROM table WHERE column1 = $column1 AND column2 = $column2 AND column3 = $column3 ) ) Note that this is going to have some concurrency issues. I think it will be possible for a second query to execute before the first commits. In that case it won't see the record the first query inserted and try to insert again. You'll just get a primary key violation though which I guess you can just ignore. Which raises a question. Why not forgoe this complicated SQL and try to do the insert. If you get a primary key violation, well there's your answer... If you don't care about the failure just ignore it and move on. I would suggest checking specifically for a primary key violation and still stopping execution on unexpected errors though. If you're doing this inside a bigger transaction that's a bit more of a pain. Until 8.0 postgres can't handle ignoring an error on a query without aborting the entire transaction. But if you're in autocommit mode then you can just ignore the primary key violation and continue. Incidentally, if you're putting your parameters directly into your queries using $column1 then you've got a potential security problem. Unless you're quoting every variable everywhere religiously using postgres's quoting functions an attacker can sneak extra SQL into your queries. Potentially including whole new statements such as "DELETE FROM table"... -- greg
> You just have to put it in the select list as a constant. If you're > feeling > generous to the next programmer to read it you could put "AS column1" > after > each one, but the column name doesn't actually have to match the column > you're > inserting into. Sweet GOD, I hope no one ever has to read the stuff I'm working on! This is just a personal curiousity project, not anything for work or anything released publicly. I'd be far too embarrassed to ever release this, as I'm just fumbling along, learning. :) > Note that this is going to have some concurrency issues. I think it will > be > possible for a second query to execute before the first commits. In that > case > it won't see the record the first query inserted and try to insert again. > You'll just get a primary key violation though which I guess you can just > ignore. Concurrency shouldn't be an issue - this is a perl script running from cron. The only concurrency that will ever happen is if I'm a bonehead and I run the script manually right as cron kicks off another copy. > Which raises a question. Why not forgoe this complicated SQL and try to do > the > insert. If you get a primary key violation, well there's your answer... If > you > don't care about the failure just ignore it and move on. I would suggest > checking specifically for a primary key violation and still stopping > execution > on unexpected errors though. Well, I was hoping to minimize the amount of perl needed to get this one task done, but I think that's going to be the best way to do it. > Incidentally, if you're putting your parameters directly into your queries > using $column1 then you've got a potential security problem. Unless you're > quoting every variable everywhere religiously using postgres's quoting > functions an attacker can sneak extra SQL into your queries. Potentially > including whole new statements such as "DELETE FROM table"... Yes indeed, good catch. I'll clean that up immediately. Thanks, Greg! Benny -- "Even if a man chops off your hand with a sword, you still have two nice, sharp bones to stick in his eyes." -- .sig on Slashdot
* C. Bensend <benny@bennyvision.com> [2004-10-15 08:14:51 -0500]: > Concurrency shouldn't be an issue - this is a perl script running > from cron. The only concurrency that will ever happen is if I'm a > bonehead and I run the script manually right as cron kicks off > another copy. Even cron'd scripts can run amok. Lock files are great as a CYA measure even if you're relatively certain there will be no opportunity for one copy to step on the next. I once had a cron'd script mail upper management every hour I was on vacation. You can imagine how long it was until I got another vacation request approved. ;) -- Steven Klassen - Lead Programmer Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Replication & Support Services, (503) 667-4564
> Even cron'd scripts can run amok. Lock files are great as a CYA > measure even if you're relatively certain there will be no opportunity > for one copy to step on the next. Yes, you are absolutely correct, of course. :) I should be more specific - I'm working on a personal project, and I'm not going to worry about that right now. The risk of a cron gone wild is acceptable to me at this moment. Right now my priority is getting data flowing into the database instead of piling up on the host like it is. :) Thanks much! Benny -- "Even if a man chops off your hand with a sword, you still have two nice, sharp bones to stick in his eyes." -- .sig on Slashdot
"C. Bensend" <benny@bennyvision.com> writes: > The risk of a cron gone wild is acceptable to me at this moment. Gee, now I have images of late-night advertisements for bofh-porn video tapes of Cron Jobs Gone Wild(tm) dancing through my head... thanks. -- greg
> Gee, now I have images of late-night advertisements for bofh-porn video > tapes > of Cron Jobs Gone Wild(tm) dancing through my head... thanks. Wow. THERE'S something I didn't need before hitting the sack. ;) Benny -- "Even if a man chops off your hand with a sword, you still have two nice, sharp bones to stick in his eyes." -- .sig on Slashdot