Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Re: Massive table (500M rows) update nightmare
Date
Msg-id hi577c$1lc2$1@news.hub.org
Whole thread Raw
In response to Re: Massive table (500M rows) update nightmare  (Leo Mannhart <leo.mannhart@beecom.ch>)
List pgsql-performance
> If it is possible to lock this audit table exclusively (may be during
> off peak hours) I would look into
> - create new_audit_table as select col1, col2, col3 ... col9,
> 'new_col_value' from old_audit_table;
> - create all indexes
> - drop old_audit_table
> - rename new_audit_table to old_audit_table
>
> That is probably the fasted method you can do, even if you have to join
> the "new_col_value" from an extra helper-table with the correspondig id.
> Remeber, databases are born to join.
>

This has all been done before - the production team was crippled while they
waited for this and the SECOND the table was available again, they jumped on
it - even though it meant recreating the bare minimum of the indexes.

> You could also try to just update the whole table in one go, it is
> probably faster than you expect.

Possibly, but with such a large table you have no idea of the progress, you
cannot interrupt it without rolling back everything. Worse, you have
applications stalling and users wanting to know what is going on - is the OS
and the DB/MVCC trashing while it does internal maintenance? Have you
reached some sort of deadlock condition that you can't see because the
server status is not helpful with so many uncommitted pending updates?

And of course, there is the file bloat.


pgsql-performance by date:

Previous
From: "Carlo Stonebanks"
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: Brian Cox
Date:
Subject: Re: query looping?