Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Massive table (500M rows) update nightmare
Date
Msg-id 4B45BE37020000250002DEE4@gw.wicourts.gov
Whole thread Raw
In response to Massive table (500M rows) update nightmare  ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
Responses Re: Massive table (500M rows) update nightmare  (Kevin Kempter <kevink@consistentstate.com>)
List pgsql-performance
Ludwik Dylag <ldylag@gmail.com> wrote:
> I would suggest:
> 1. turn off autovacuum
> 1a. ewentually tune db for better performace for this kind of
>     operation (cant not help here)
> 2. restart database
> 3. drop all indexes
> 4. update
> 5. vacuum full table
> 6. create indexes
> 7. turn on autovacuum

I've only ever attempted something like that with a few tens of
millions of rows.  I gave up on waiting for the VACUUM FULL step
after a few days.

I some scheduled down time is acceptable (with "some" kind of hard
to estimate accurately) the best bet would be to add the column with
the USING clause to fill in the value.  (I think that would cause a
table rewrite; if not, then add something to the ALTER TABLE which
would.)  My impression is that the OP would rather stretch out the
implementation than to suffer down time, which can certainly be a
valid call.

If that is the goal, then the real question is whether there's a way
to tune the incremental updates to speed that phase.  Carlo, what
version of PostgreSQL is this?  Can you show us the results of an
EXPLAIN ANALYZE for the run of one iteration of the UPDATE?
Information on the OS, hardware, PostgreSQL build configuration, and
the contents of postgresql.conf (excluding all comments) could help
us spot possible techniques to speed this up.

-Kevin

pgsql-performance by date:

Previous
From: Ludwik Dylag
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: "Kevin Grittner"
Date:
Subject: Re: "large" spam tables and performance: postgres memory parameters