Re: Support for REINDEX CONCURRENTLY - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Support for REINDEX CONCURRENTLY |
Date | |
Msg-id | 201210031010.52775.andres@2ndquadrant.com Whole thread Raw |
In response to | Support for REINDEX CONCURRENTLY (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Support for REINDEX CONCURRENTLY
(Michael Paquier <michael.paquier@gmail.com>)
Re: Support for REINDEX CONCURRENTLY (Simon Riggs <simon@2ndQuadrant.com>) Re: Support for REINDEX CONCURRENTLY (Greg Stark <stark@mit.edu>) Re: Support for REINDEX CONCURRENTLY (Michael Paquier <michael.paquier@gmail.com>) |
List | pgsql-hackers |
Hi, On Wednesday, October 03, 2012 03:14:17 AM Michael Paquier wrote: > One of the outputs on the discussions about the integration of pg_reorg in > core > was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER > TABLE concurrently with low-level locks in a way similar to CREATE INDEX > CONCURRENTLY. > > The discussions done can be found on this thread: > http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php > > Well, I spent some spare time working on the implementation of REINDEX > CONCURRENTLY. Very cool! > This basically allows to perform read and write operations on a table whose > index(es) are reindexed at the same time. Pretty useful for a production > environment. The caveats of this feature is that it is slower than normal > reindex, and impacts other backends with the extra CPU, memory and IO it > uses to process. The implementation is based on something on the same ideas > as pg_reorg and on an idea of Andres. > The following restrictions are applied. > - REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently. I would like to support something like REINDEX USER TABLES; or similar at some point, but that very well can be a second phase. > - REINDEX CONCURRENTLY cannot run inside a transaction block. > - toast relations are reindexed non-concurrently when table reindex is done > and that this table has toast relations Why that restriction? > Here is a description of what happens when reorganizing an index > concurrently > (the beginning of the process is similar to CREATE INDEX CONCURRENTLY): > 1) creation of a new index based on the same columns and restrictions as > the index that is rebuilt (called here old index). This new index has as > name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as > invalid and not ready. You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that point already, to prevent schema changes. > 8) Take a reference snapshot and validate the new indexes Hm. Unless you factor in corrupt indices, why should this be needed? > 14) Swap new and old indexes, consisting here in switching their names. I think switching based on their names is not going to work very well because indexes are referenced by oid at several places. Swapping pg_index.indexrelid or pg_class.relfilenode seems to be the better choice to me. We expect relfilenode changes for such commands, but not ::regclass oid changes. Such a behaviour would at least be complicated for pg_depend and pg_constraint. > The following process might be reducible, but I would like that to be > decided depending on the community feedback and experience on such > concurrent features. > For the time being I took an approach that looks slower, but secured to my > mind with multiple waits (perhaps sometimes unnecessary?) and > subtransactions. > If during the process an error occurs, the table will finish with either > the old or new index as invalid. In this case the user will be in charge to > drop the invalid index himself. > The concurrent index can be easily identified with its suffix *_cct. I am not really happy about relying on some arbitrary naming here. That still can result in conflicts and such. > This patch has required some refactorisation effort as I noticed that the > code of index for concurrent operations was not very generic. In order to do > that, I created some new functions in index.c called index_concurrent_* > which are used by CREATE INDEX and REINDEX in my patch. Some refactoring has > also been done regarding the> wait processes. > REINDEX TABLE and REINDEX INDEX follow the same code path > (ReindexConcurrentIndexes in indexcmds.c). The patch structure is relying a > maximum on the functions of index.c when creating, building and validating > concurrent index. I haven't looked at the patch yet, but I was pretty sure that you would need to do quite some refactoring to implement this and this looks like roughly the right direction... > Thanks, and looking forward to your feedback, I am very happy that youre taking this on! Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: