Thread: Question on Insert / Update

Question on Insert / Update

From

Alex

Date:

09 November 2005, 10:52:28

Hi,
have just a general question...

I have a table of 10M records, unique key on 5 fields.
I need to update/insert 200k records in one go.

I could do a select to check for existence and then either insert or update.
Or simply insert, check on the error code an update if required.

The 2nd seems to be to logical choice, but will it actually be faster
and moreover is that the right way to do it?

Thanks
Alex

Re: Question on Insert / Update

From

Csaba Nagy

Date:

09 November 2005, 11:08:27

I guess the best solution is one which allows you to do it in batches,
as inserting is more efficient if you don't commit after each insert.

On Wed, 2005-11-09 at 15:45, Alex wrote:
> Hi,
> have just a general question...
>
> I have a table of 10M records, unique key on 5 fields.
> I need to update/insert 200k records in one go.
>
> I could do a select to check for existence and then either insert or update.
> Or simply insert, check on the error code an update if required.
> The 2nd seems to be to logical choice, but will it actually be faster
> and moreover is that the right way to do it?

Are you aware that you can't do the 2nd in the same transaction ? The
error will invalidate the current transaction. So you either do each
insert in it's own transaction, basically with autocommit on, or you
place a savepoint before each insert and roll back to it on the error,
which is probably cheaper than commit but still not free. So I guess
this is not good for batching.

We here do all this kind of stuff by first looking up a batch of rows,
then insert the missing/update the existing also in batch mode.

In any case, I wonder too if it can't be done better, cause our imports
are significantly slower on postgres than on other DBs (intentionally no
names mentioned), using the same code and the machines having similar
hardware.

Cheers,
Csaba.

Re: Question on Insert / Update

From

"A. Kretschmer"

Date:

09 November 2005, 11:20:23

am  10.11.2005, um  1:45:46 +1100 mailte Alex folgendes:
> Hi,
> have just a general question...
>
> I have a table of 10M records, unique key on 5 fields.
> I need to update/insert 200k records in one go.
>
> I could do a select to check for existence and then either insert or
> update.
> Or simply insert, check on the error code an update if required.

Take a look at
http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING



HTH, Andreas
--
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===    Schollglas Unternehmensgruppe    ===

Re: Question on Insert / Update

From

Csaba Nagy

Date:

09 November 2005, 11:36:21

Quote from the link below:

"Tip:  A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need."

I would think this places an automatic save-point at the begin of the
block. I doubt this would give the best performance for the given
problem...

Cheers,
Csaba.

On Wed, 2005-11-09 at 16:20, A. Kretschmer wrote:
> am  10.11.2005, um  1:45:46 +1100 mailte Alex folgendes:
> > Hi,
> > have just a general question...
> >
> > I have a table of 10M records, unique key on 5 fields.
> > I need to update/insert 200k records in one go.
> >
> > I could do a select to check for existence and then either insert or
> > update.
> > Or simply insert, check on the error code an update if required.
>
> Take a look at
> http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
>
>
>
> HTH, Andreas

Re: Question on Insert / Update

From

Sean Davis

Date:

09 November 2005, 15:28:37

On 11/9/05 9:45 AM, "Alex" <alex@meerkatsoft.com> wrote:

> Hi,
> have just a general question...
>
> I have a table of 10M records, unique key on 5 fields.
> I need to update/insert 200k records in one go.
>
> I could do a select to check for existence and then either insert or update.
> Or simply insert, check on the error code an update if required.
>
> The 2nd seems to be to logical choice, but will it actually be faster
> and moreover is that the right way to do it?

Probably the fastest and most robust way to go about this if you have the
records in the form of a tab-delimited file is to COPY or \copy (in psql)
them into a separate loader table and then use SQL to manipulate the records
(check for duplicates, etc) for final insertion into the table.

Sean

Re: Question on Insert / Update

From

Alex

Date:

09 November 2005, 22:14:24

Will give that a try. thanks.
was actually interested if the 2nd approach is common practice or if
there are some reasons not to do it that way.

Alex

Sean Davis wrote:

>On 11/9/05 9:45 AM, "Alex" <alex@meerkatsoft.com> wrote:
>
>
>
>>Hi,
>>have just a general question...
>>
>>I have a table of 10M records, unique key on 5 fields.
>>I need to update/insert 200k records in one go.
>>
>>I could do a select to check for existence and then either insert or update.
>>Or simply insert, check on the error code an update if required.
>>
>>The 2nd seems to be to logical choice, but will it actually be faster
>>and moreover is that the right way to do it?
>>
>>
>
>Probably the fastest and most robust way to go about this if you have the
>records in the form of a tab-delimited file is to COPY or \copy (in psql)
>them into a separate loader table and then use SQL to manipulate the records
>(check for duplicates, etc) for final insertion into the table.
>
>Sean
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend
>
>
>

TRUNCATE Question

From

Alex

Date:

09 November 2005, 22:16:23

Hi,
could anyone tell me if it is necessary to run a Vacuum after truncating
a table or is that done automatically.

Thanks
Alex

Re: TRUNCATE Question

From

Tom Lane

Date:

09 November 2005, 22:33:03

Alex <alex@meerkatsoft.com> writes:
> could anyone tell me if it is necessary to run a Vacuum after truncating
> a table or is that done automatically.

TRUNCATE does not require a vacuum; indeed there's nothing left *to* vacuum.

            regards, tom lane

Re: TRUNCATE Question

From

Michael Fuhr

Date:

09 November 2005, 22:38:31

On Thu, Nov 10, 2005 at 01:16:24PM +1100, Alex wrote:
> could anyone tell me if it is necessary to run a Vacuum after truncating
> a table or is that done automatically.

http://www.postgresql.org/docs/8.0/interactive/maintenance.html#VACUUM-FOR-SPACE-RECOVERY

"If you have a table whose contents are deleted on a periodic basis,
consider doing it with TRUNCATE rather than using DELETE followed by
VACUUM. TRUNCATE removes the entire content of the table immediately,
without requiring a subsequent VACUUM or VACUUM FULL to reclaim the
now-unused disk space."

--
Michael Fuhr