Thread: ignore unique violation OR check row exists

ignore unique violation OR check row exists

From

rverghese

Date:

03 January 2012, 16:54:49

I want to insert a bunch of records and not do anything if the record already
exists. So the 2 options I considered are 1) check if row exists or insert
and 2) ignore the unique violation on insert if row exists. 
Any opinions on whether it is faster to INSERT and then catch the UNIQUE
VIOLATION exception and ignore it in plpgsql  versus check if row exists and
INSERT if it doesn't. 
I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
plpgsql command, so if I have to do the check and insert, alternatively i
have a function that tries to insert and then ignores the violation. I was
wondering if one way was better than the other.
Thanks

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: ignore unique violation OR check row exists

From

Misa Simic

Date:

03 January 2012, 17:54:11

If exists is better, though insert line by line and check if exists may
be very slow...

The best would be if you can use copy command from csv to staging table
(without constraints) and then

Insert to live from stage where stage constraint column not exist in
live...

Kind Regards,

Misa

Sent from my Windows Phone
From: rverghese
Sent: 03/01/2012 21:55
To: pgsql-sql@postgresql.org
Subject: [SQL] ignore unique violation OR check row exists
I want to insert a bunch of records and not do anything if the record already
exists. So the 2 options I considered are 1) check if row exists or insert
and 2) ignore the unique violation on insert if row exists.
Any opinions on whether it is faster to INSERT and then catch the UNIQUE
VIOLATION exception and ignore it in plpgsql  versus check if row exists and
INSERT if it doesn't.
I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
plpgsql command, so if I have to do the check and insert, alternatively i
have a function that tries to insert and then ignores the violation. I was
wondering if one way was better than the other.
Thanks

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: ignore unique violation OR check row exists

From

Misa Simic

Date:

04 January 2012, 01:53:53

If exists is better, though insert line by line and check if exists may
be very slow...

The best would be if you can use copy command from csv to staging table
(without constraints) and then

Insert to live from stage where stage constraint column not exist in
live...

Kind Regards,

Misa

Sent from my Windows Phone
From: rverghese
Sent: 03/01/2012 21:55
To: pgsql-sql@postgresql.org
Subject: [SQL] ignore unique violation OR check row exists
I want to insert a bunch of records and not do anything if the record already
exists. So the 2 options I considered are 1) check if row exists or insert
and 2) ignore the unique violation on insert if row exists.
Any opinions on whether it is faster to INSERT and then catch the UNIQUE
VIOLATION exception and ignore it in plpgsql  versus check if row exists and
INSERT if it doesn't.
I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
plpgsql command, so if I have to do the check and insert, alternatively i
have a function that tries to insert and then ignores the violation. I was
wondering if one way was better than the other.
Thanks

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5117916.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: ignore unique violation OR check row exists

From

Jasen Betts

Date:

04 January 2012, 05:02:07

On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote:
> If exists is better, though insert line by line and check if exists may
> be very slow...
>
> The best would be if you can use copy command from csv to staging table
> (without constraints) and then
>
> Insert to live from stage where stage constraint column not exist in
> live...
>

Its a good idea to check that the staging table doesn't
conflict with itself, before tryign to insert it.

-- 
⚂⚃ 100% natural

Re: ignore unique violation OR check row exists

From

Andreas Kretschmer

Date:

04 January 2012, 05:08:58

rverghese <riyav@hotmail.com> wrote:

> I want to insert a bunch of records and not do anything if the record already
> exists. So the 2 options I considered are 1) check if row exists or insert
> and 2) ignore the unique violation on insert if row exists. 
> Any opinions on whether it is faster to INSERT and then catch the UNIQUE
> VIOLATION exception and ignore it in plpgsql  versus check if row exists and
> INSERT if it doesn't. 
> I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
> plpgsql command, so if I have to do the check and insert, alternatively i
> have a function that tries to insert and then ignores the violation. I was
> wondering if one way was better than the other.
> Thanks

If i where you i would use an extra table. Insert all new records into
this table (i called it src) and select all records from src into dest
(the destination table) which are not in dest. See my example:

test=# select * from dest ;i
---12345
(5 rows)

Time: 0,148 ms
test=*# select * from src ;i
---456789
(6 rows)

Time: 0,200 ms
test=*# insert into dest select src.* from src left join dest using (i)
where dest.i is null;
INSERT 0 4
Time: 0,464 ms
test=*# select * from dest;i
---123456789
(9 rows)


I think, that's the fastest way.



Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: ignore unique violation OR check row exists

From

Zdravko Balorda

Date:

04 January 2012, 05:57:54

Andreas Kretschmer wrote:
> rverghese <riyav@hotmail.com> wrote:
> 
>> I want to insert a bunch of records and not do anything if the record already
>> exists. So the 2 options I considered are 1) check if row exists or insert
>> and 2) ignore the unique violation on insert if row exists. 
>> Any opinions on whether it is faster to INSERT and then catch the UNIQUE
>> VIOLATION exception and ignore it in plpgsql  versus check if row exists and
>> INSERT if it doesn't. 
>> I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
>> plpgsql command, so if I have to do the check and insert, alternatively i
>> have a function that tries to insert and then ignores the violation. I was
>> wondering if one way was better than the other.
>> Thanks


Take it out of transaction. Why is there a transaction in the first place?
If transaction is needed, ok, but take these inserts out and everything will
work as it should. Ignoring UNIQUE VIOLATION or any other error defeats the very
purpose of transaction. That's why you can't ignore it.

Zdravko

Re: ignore unique violation OR check row exists

From

Samuel Gendler

Date:

04 January 2012, 06:10:11

On Wed, Jan 4, 2012 at 1:57 AM, Zdravko Balorda <zdravko.balorda@siix.com> wrote:

Andreas Kretschmer wrote:
rverghese <riyav@hotmail.com> wrote:

I want to insert a bunch of records and not do anything if the record already
exists. So the 2 options I considered are 1) check if row exists or insert
and 2) ignore the unique violation on insert if row exists. Any opinions on whether it is faster to INSERT and then catch the UNIQUE
VIOLATION exception and ignore it in plpgsql versus check if row exists and
INSERT if it doesn't. I can't seem to ignore the UNIQUE VIOLATION exception via php, since it is a
plpgsql command, so if I have to do the check and insert, alternatively i
have a function that tries to insert and then ignores the violation. I was
wondering if one way was better than the other.
Thanks

Take it out of transaction. Why is there a transaction in the first place?
If transaction is needed, ok, but take these inserts out and everything will
work as it should. Ignoring UNIQUE VIOLATION or any other error defeats the very
purpose of transaction. That's why you can't ignore it.

Unfortunately, bulk inserts are much slower when they don't occur in a transaction. Try inserting 1 million rows with auto commit enabled vs 1 million rows in 1 transaction, or even 10 or 100 transactions. The difference is enormous. The bulk insert into an unconstrained table and then pulling just the new rows over into the destination table in a single transaction is definitely the most effective way to do this.

Re: ignore unique violation OR check row exists

From

Misa Simic

Date:

04 January 2012, 06:23:48

Well, idea is to make process faster as possible... And usualy staging
table does not have any constrains so can't violates...

When we want to import banch of data... Process when we taking row by
row from source, validate it, if valid insert to some table could be
very slow...

Much faster is when we work with sets..

=E2=80=A2import all records to some table without constrains (staging table=
).
And best would be if we can use COPY command instead of insert...

=E2=80=A2inert into liveTable select all valid records from stagingTable

Of course it is just in case when we want to import what is ok... In
case all or nothing - import direct to liveTable works fine...

Sent from my Windows Phone
From: Jasen Betts
Sent: 04/01/2012 10:02
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] ignore unique violation OR check row exists
On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote:
> If exists is better, though insert line by line and check if exists may
> be very slow...
>
> The best would be if you can use copy command from csv to staging table
> (without constraints) and then
>
> Insert to live from stage where stage constraint column not exist in
> live...
>

Its a good idea to check that the staging table doesn't
conflict with itself, before tryign to insert it.

--=20
=E2=9A=82=E2=9A=83 100% natural

--=20
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: ignore unique violation OR check row exists

From

Zdravko Balorda

Date:

04 January 2012, 08:34:21

Samuel Gendler wrote:
> 
> 
> On Wed, Jan 4, 2012 at 1:57 AM, Zdravko Balorda 
> <zdravko.balorda@siix.com <mailto:zdravko.balorda@siix.com>> wrote:
>     Take it out of transaction. Why is there a transaction in the first
>     place?
>     If transaction is needed, ok, but take these inserts out and
>     everything will
>     work as it should. Ignoring UNIQUE VIOLATION or any other error
>     defeats the very
>     purpose of transaction. That's why you can't ignore it.
> 
> 
> Unfortunately, bulk inserts are much slower when they don't occur in a 
> transaction.  Try inserting 1 million rows with auto commit enabled vs 1 
> million rows in 1 transaction, or even 10 or 100 transactions. The 
> difference is enormous.  The bulk insert into an unconstrained table and 
> then pulling just the new rows over into the destination table in a 
> single transaction is definitely the most effective way to do this.
> 

I do a lot of bulk inserts. What helps is dropping indexes before insert
and recreating it after. Probably you need to better organize data to
avoid having primary keys on a table with a lots of data.

Zdravko

Re: ignore unique violation OR check row exists

From

rverghese

Date:

04 January 2012, 13:34:53

Thanks for the recommendations. Unfortunately I have to clean out the data
before I insert, so I cannot do a bulk copy from a CSV, I will try the
option of inserting into src table and then copying relevant data to dest
table and see if that works faster for me. I suppose I could bulk insert and
then clean out the data before I insert into destination table. I'll have to
see how long that takes.


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/ignore-unique-violation-OR-check-row-exists-tp5117916p5120317.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.

Re: ignore unique violation OR check row exists

From

Misa Simic

Date:

04 January 2012, 14:23:29

Well, idea is to make process faster as possible... And usualy staging
table does not have any constrains so can't violates...

When we want to import banch of data... Process when we taking row by
row from source, validate it, if valid insert to some table could be
very slow...

Much faster is when we work with sets..

•import all records to some table without constrains (staging table).
And best would be if we can use COPY command instead of insert...

•inert into liveTable select all valid records from stagingTable

Of course it is just in case when we want to import what is ok... In
case all or nothing - import direct to liveTable works fine...

Sent from my Windows Phone
From: Jasen Betts
Sent: 04/01/2012 10:02
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] ignore unique violation OR check row exists
On 2012-01-03, Misa Simic <misa.simic@gmail.com> wrote:
> If exists is better, though insert line by line and check if exists may
> be very slow...
>
> The best would be if you can use copy command from csv to staging table
> (without constraints) and then
>
> Insert to live from stage where stage constraint column not exist in
> live...
>

Its a good idea to check that the staging table doesn't
conflict with itself, before tryign to insert it.

--
⚂⚃ 100% natural

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql