Thread: Bulk update of large database
I have two tables of roughly 200,000,000 records and want to update one column in one of the tables according to values in the second table using a unique key. For example: update table1 set x=1 from table2 where exists (select * from table2 table1.key=table2.key); (or using an IN clause or using a straight join but EXPLAIN tells me that the latter is much slower). This does work but appends the updates (until the next vacuum). For a 100GB database, this is too large of a storage overhead. Is there another good way? I've searched the newsgroups, docs and books without a clue . . . Thanks much, --Martin =========================================================================== Martin Weinberg Phone: (413) 545-3821 Dept. of Physics and Astronomy FAX: (413) 545-2117/0648 530 Graduate Research Tower weinberg@astro.umass.edu University of Massachusetts http://www.astro.umass.edu/~weinberg/ Amherst, MA 01003-4525
Martin Weinberg <weinberg@osprey.astro.umass.edu> writes: > This does work but appends the updates (until the next vacuum). > For a 100GB database, this is too large of a storage overhead. > Is there another good way? There is no alternative; any sort of update operation will write a new tuple value without first deleting the old. This must be so to preserve transaction semantics: if an error occurs later on during the update (eg, violation of a unique-index constraint) the old tuple value must still be there. The only answer I can see is to update however many tuples you can spare the space for, commit the transaction, vacuum, repeat. The need for repeated vacuums in this scenario is pretty annoying. It'd be nice if we could recycle dead tuples without a full vacuum. Offhand I don't see any way to do it without introducing performance penalties elsewhere... regards, tom lane