Home > mailing lists

Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

From	Carlo Stonebanks
Subject	Re: Massive table (500M rows) update nightmare
Date	January 7, 2010 14:14:26
Msg-id	hi577c$1lc2$1@news.hub.org Whole thread Raw
In response to	Re: Massive table (500M rows) update nightmare (Leo Mannhart <leo.mannhart@beecom.ch>)
List	pgsql-performance

Tree view

> If it is possible to lock this audit table exclusively (may be during
> off peak hours) I would look into
> - create new_audit_table as select col1, col2, col3 ... col9,
> 'new_col_value' from old_audit_table;
> - create all indexes
> - drop old_audit_table
> - rename new_audit_table to old_audit_table
>
> That is probably the fasted method you can do, even if you have to join
> the "new_col_value" from an extra helper-table with the correspondig id.
> Remeber, databases are born to join.
>

This has all been done before - the production team was crippled while they
waited for this and the SECOND the table was available again, they jumped on
it - even though it meant recreating the bare minimum of the indexes.

> You could also try to just update the whole table in one go, it is
> probably faster than you expect.

Possibly, but with such a large table you have no idea of the progress, you
cannot interrupt it without rolling back everything. Worse, you have
applications stalling and users wanting to know what is going on - is the OS
and the DB/MVCC trashing while it does internal maintenance? Have you
reached some sort of deadlock condition that you can't see because the
server status is not helpful with so many uncommitted pending updates?

And of course, there is the file bloat.

pgsql-performance by date:

From: "Carlo Stonebanks"
Date: 07 January 2010, 14:14:25
Subject: Re: Massive table (500M rows) update nightmare

From: Brian Cox
Date: 07 January 2010, 14:45:01
Subject: Re: query looping?

Re: Massive table (500M rows) update nightmare - Mailing list pgsql-performance

Previous

Next