Thread: Bulk update of large database

Bulk update of large database

From

Martin Weinberg

Date:

20 November 1999, 11:05:19

I have two tables of roughly 200,000,000 records and want
to update one column in one of the tables according to
values in the second table using a unique key.

For example:
update table1 set x=1 from table2 where        exists (select * from table2 table1.key=table2.key);

(or using an IN clause or using a straight join but EXPLAIN tells me
that the latter is much slower).

This does work but appends the updates (until the next vacuum). 
For a  100GB database, this is too large of a storage overhead.  
Is there another good way?  I've searched the newsgroups, docs and 
books without a clue . . .

Thanks much,

--Martin

===========================================================================

Martin Weinberg                      Phone: (413) 545-3821
Dept. of Physics and Astronomy       FAX:   (413) 545-2117/0648
530 Graduate Research Tower         weinberg@astro.umass.edu
University of Massachusetts         http://www.astro.umass.edu/~weinberg/
Amherst, MA  01003-4525

Re: [HACKERS] Bulk update of large database

From

Tom Lane

Date:

20 November 1999, 12:33:20

Martin Weinberg <weinberg@osprey.astro.umass.edu> writes:
> This does work but appends the updates (until the next vacuum). 
> For a  100GB database, this is too large of a storage overhead.  
> Is there another good way?

There is no alternative; any sort of update operation will write a
new tuple value without first deleting the old.  This must be so
to preserve transaction semantics: if an error occurs later on
during the update (eg, violation of a unique-index constraint) the
old tuple value must still be there.

The only answer I can see is to update however many tuples you can
spare the space for, commit the transaction, vacuum, repeat.

The need for repeated vacuums in this scenario is pretty annoying.
It'd be nice if we could recycle dead tuples without a full vacuum.
Offhand I don't see any way to do it without introducing performance
penalties elsewhere...
        regards, tom lane