On Thursday 07 January 2010 09:57:59 Kevin Grittner wrote:
> Ludwik Dylag <ldylag@gmail.com> wrote:
> > I would suggest:
> > 1. turn off autovacuum
> > 1a. ewentually tune db for better performace for this kind of
> > operation (cant not help here)
> > 2. restart database
> > 3. drop all indexes
> > 4. update
> > 5. vacuum full table
> > 6. create indexes
> > 7. turn on autovacuum
>
> I've only ever attempted something like that with a few tens of
> millions of rows. I gave up on waiting for the VACUUM FULL step
> after a few days.
>
> I some scheduled down time is acceptable (with "some" kind of hard
> to estimate accurately) the best bet would be to add the column with
> the USING clause to fill in the value. (I think that would cause a
> table rewrite; if not, then add something to the ALTER TABLE which
> would.) My impression is that the OP would rather stretch out the
> implementation than to suffer down time, which can certainly be a
> valid call.
>
> If that is the goal, then the real question is whether there's a way
> to tune the incremental updates to speed that phase. Carlo, what
> version of PostgreSQL is this? Can you show us the results of an
> EXPLAIN ANALYZE for the run of one iteration of the UPDATE?
> Information on the OS, hardware, PostgreSQL build configuration, and
> the contents of postgresql.conf (excluding all comments) could help
> us spot possible techniques to speed this up.
>
> -Kevin
>
If you can come up with an effective method of tracking updates/deletes/inserts
such as a trigger that writes the PK to a separate table upon any inserts,
updates or deletes to the table you could do something like this:
1) create new table (no indexes) with the structure you want the table to have
at the end of the process (i.e. the post-altered state) [new_tab]
2) create the insert,update,delete triggers mentioned above on the existing
table [curr_tab] and write all the PK id's that change into a 3rd table
[changed_keys]
3) kick off a process that simply does a select from curr_tab into new_tab and
populates/back-fills the new column as part of the query
4) let it run as long as it takes
5) once it's complete do this:
Create the all the indexes on the new_tab
BEGIN;
LOCK TABLE curr_tab;
DELETE from new_tab
where pk_id in (select distinct pk_id from changed_keys);
INSERT into new_tab
select * from curr_tab
where curr_tab.pk_id in (select distinct pk_id from changed_keys);
ALTER TABLE curr_tab RENAME to old_tab;
ALTER TABLE new_tab RENAME to curr_tab;
COMMIT;
Also you might want to consider partitioning this table in the process...
Once you're confident you no longer need the old table [old_tab] you can drop
it