Thread: ignore unique violation OR check row exists
I want to insert a bunch of records and not do anything if the record already exists. So the 2 options I considered are 1) check if row exists or insert and 2) ignore the unique violation on insert if row exists. Any opinions on whether it is faster to INSERT and then catch the UNIQUE VIOLATION exception and ignore it in plpgsql versus check if row exists and INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a plpgsql command, so if I have to do the check and insert, alternatively i have a function that tries to insert and then ignores the violation. I was wondering if one way was better than the other. Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
If exists is better, though insert line by line and check if exists may be very slow... The best would be if you can use copy command from csv to staging table (without constraints) and then Insert to live from stage where stage constraint column not exist in live... Kind Regards, Misa Sent from my Windows Phone From: rverghese Sent: 03/01/2012 21:55 To: pgsql-sql@postgresql.org Subject: [SQL] ignore unique violation OR check row exists I want to insert a bunch of records and not do anything if the record already exists. So the 2 options I considered are 1) check if row exists or insert and 2) ignore the unique violation on insert if row exists. Any opinions on whether it is faster to INSERT and then catch the UNIQUE VIOLATION exception and ignore it in plpgsql versus check if row exists and INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a plpgsql command, so if I have to do the check and insert, alternatively i have a function that tries to insert and then ignores the violation. I was wondering if one way was better than the other. Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
If exists is better, though insert line by line and check if exists may be very slow... The best would be if you can use copy command from csv to staging table (without constraints) and then Insert to live from stage where stage constraint column not exist in live... Kind Regards, Misa Sent from my Windows Phone From: rverghese Sent: 03/01/2012 21:55 To: pgsql-sql@postgresql.org Subject: [SQL] ignore unique violation OR check row exists I want to insert a bunch of records and not do anything if the record already exists. So the 2 options I considered are 1) check if row exists or insert and 2) ignore the unique violation on insert if row exists. Any opinions on whether it is faster to INSERT and then catch the UNIQUE VIOLATION exception and ignore it in plpgsql versus check if row exists and INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a plpgsql command, so if I have to do the check and insert, alternatively i have a function that tries to insert and then ignores the violation. I was wondering if one way was better than the other. Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote: > If exists is better, though insert line by line and check if exists may > be very slow... > > The best would be if you can use copy command from csv to staging table > (without constraints) and then > > Insert to live from stage where stage constraint column not exist in > live... > Its a good idea to check that the staging table doesn't conflict with itself, before tryign to insert it. -- ⚂⚃ 100% natural
rverghese <riyav@hotmail.com> wrote: > I want to insert a bunch of records and not do anything if the record already > exists. So the 2 options I considered are 1) check if row exists or insert > and 2) ignore the unique violation on insert if row exists. > Any opinions on whether it is faster to INSERT and then catch the UNIQUE > VIOLATION exception and ignore it in plpgsql versus check if row exists and > INSERT if it doesn't. > I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a > plpgsql command, so if I have to do the check and insert, alternatively i > have a function that tries to insert and then ignores the violation. I was > wondering if one way was better than the other. > Thanks If i where you i would use an extra table. Insert all new records into this table (i called it src) and select all records from src into dest (the destination table) which are not in dest. See my example: test=# select * from dest ;i ---12345 (5 rows) Time: 0,148 ms test=*# select * from src ;i ---456789 (6 rows) Time: 0,200 ms test=*# insert into dest select src.* from src left join dest using (i) where dest.i is null; INSERT 0 4 Time: 0,464 ms test=*# select * from dest;i ---123456789 (9 rows) I think, that's the fastest way. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Andreas Kretschmer wrote: > rverghese <riyav@hotmail.com> wrote: > >> I want to insert a bunch of records and not do anything if the record already >> exists. So the 2 options I considered are 1) check if row exists or insert >> and 2) ignore the unique violation on insert if row exists. >> Any opinions on whether it is faster to INSERT and then catch the UNIQUE >> VIOLATION exception and ignore it in plpgsql versus check if row exists and >> INSERT if it doesn't. >> I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a >> plpgsql command, so if I have to do the check and insert, alternatively i >> have a function that tries to insert and then ignores the violation. I was >> wondering if one way was better than the other. >> Thanks Take it out of transaction. Why is there a transaction in the first place? If transaction is needed, ok, but take these inserts out and everything will work as it should. Ignoring UNIQUE VIOLATION or any other error defeats the very purpose of transaction. That's why you can't ignore it. Zdravko
On Wed, Jan 4, 2012 at 1:57 AM, Zdravko Balorda <zdravko.balorda@siix.com> wrote:
Andreas Kretschmer wrote:Take it out of transaction. Why is there a transaction in the first place?rverghese <riyav@hotmail.com> wrote:I want to insert a bunch of records and not do anything if the record already
exists. So the 2 options I considered are 1) check if row exists or insert
and 2) ignore the unique violation on insert if row exists. Any opinions on whether it is faster to INSERT and then catch the UNIQUE
VIOLATION exception and ignore it in plpgsql versus check if row exists and
INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
plpgsql command, so if I have to do the check and insert, alternatively i
have a function that tries to insert and then ignores the violation. I was
wondering if one way was better than the other.
Thanks
If transaction is needed, ok, but take these inserts out and everything will
work as it should. Ignoring UNIQUE VIOLATION or any other error defeats the very
purpose of transaction. That's why you can't ignore it.
Unfortunately, bulk inserts are much slower when they don't occur in a transaction. Try inserting 1 million rows with auto commit enabled vs 1 million rows in 1 transaction, or even 10 or 100 transactions. The difference is enormous. The bulk insert into an unconstrained table and then pulling just the new rows over into the destination table in a single transaction is definitely the most effective way to do this.
Well, idea is to make process faster as possible... And usualy staging table does not have any constrains so can't violates... When we want to import banch of data... Process when we taking row by row from source, validate it, if valid insert to some table could be very slow... Much faster is when we work with sets.. =E2=80=A2import all records to some table without constrains (staging table= ). And best would be if we can use COPY command instead of insert... =E2=80=A2inert into liveTable select all valid records from stagingTable Of course it is just in case when we want to import what is ok... In case all or nothing - import direct to liveTable works fine... Sent from my Windows Phone From: Jasen Betts Sent: 04/01/2012 10:02 To: pgsql-sql@postgresql.org Subject: Re: [SQL] ignore unique violation OR check row exists On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote: > If exists is better, though insert line by line and check if exists may > be very slow... > > The best would be if you can use copy command from csv to staging table > (without constraints) and then > > Insert to live from stage where stage constraint column not exist in > live... > Its a good idea to check that the staging table doesn't conflict with itself, before tryign to insert it. --=20 =E2=9A=82=E2=9A=83 100% natural --=20 Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Samuel Gendler wrote: > > > On Wed, Jan 4, 2012 at 1:57 AM, Zdravko Balorda > <zdravko.balorda@siix.com <mailto:zdravko.balorda@siix.com>> wrote: > Take it out of transaction. Why is there a transaction in the first > place? > If transaction is needed, ok, but take these inserts out and > everything will > work as it should. Ignoring UNIQUE VIOLATION or any other error > defeats the very > purpose of transaction. That's why you can't ignore it. > > > Unfortunately, bulk inserts are much slower when they don't occur in a > transaction. Try inserting 1 million rows with auto commit enabled vs 1 > million rows in 1 transaction, or even 10 or 100 transactions. The > difference is enormous. The bulk insert into an unconstrained table and > then pulling just the new rows over into the destination table in a > single transaction is definitely the most effective way to do this. > I do a lot of bulk inserts. What helps is dropping indexes before insert and recreating it after. Probably you need to better organize data to avoid having primary keys on a table with a lots of data. Zdravko
Thanks for the recommendations. Unfortunately I have to clean out the data before I insert, so I cannot do a bulk copy from a CSV, I will try the option of inserting into src table and then copying relevant data to dest table and see if that works faster for me. I suppose I could bulk insert and then clean out the data before I insert into destination table. I'll have to see how long that takes. -- View this message in context: http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5120317.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Well, idea is to make process faster as possible... And usualy staging table does not have any constrains so can't violates... When we want to import banch of data... Process when we taking row by row from source, validate it, if valid insert to some table could be very slow... Much faster is when we work with sets.. •import all records to some table without constrains (staging table). And best would be if we can use COPY command instead of insert... •inert into liveTable select all valid records from stagingTable Of course it is just in case when we want to import what is ok... In case all or nothing - import direct to liveTable works fine... Sent from my Windows Phone From: Jasen Betts Sent: 04/01/2012 10:02 To: pgsql-sql@postgresql.org Subject: Re: [SQL] ignore unique violation OR check row exists On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote: > If exists is better, though insert line by line and check if exists may > be very slow... > > The best would be if you can use copy command from csv to staging table > (without constraints) and then > > Insert to live from stage where stage constraint column not exist in > live... > Its a good idea to check that the staging table doesn't conflict with itself, before tryign to insert it. -- ⚂⚃ 100% natural -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql