Home > mailing lists
Re: REINDEX CONCURRENTLY 2.0 - Mailing list pgsql-hackers

From	Jim Nasby
Subject	Re: REINDEX CONCURRENTLY 2.0
Date	October 29, 2014 02:00:28
Msg-id	54501FB8.7000607@BlueTreble.com Whole thread Raw
In response to	Re: REINDEX CONCURRENTLY 2.0 (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: REINDEX CONCURRENTLY 2.0
List	pgsql-hackers
Tree view
On 10/1/14, 2:00 AM, Michael Paquier wrote:
> On Wed, Aug 27, 2014 at 3:53 PM, Michael Paquier <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>>
wrote:
>
>     On Wed, Aug 27, 2014 at 3:41 PM, Andres Freund <andres@2ndquadrant.com <mailto:andres@2ndquadrant.com>> wrote:
>     > Can you add it to the next CF? I'll try to look earlier, but can't
>     > promise anything.
>     >
>     > I very much would like this to get committed in some form or another.
>     Added it here to keep track of it:
>     https://commitfest.postgresql.org/action/patch_view?id=1563
>
> Attached is a fairly-refreshed patch that should be used as a base for the next commit fest. The following changes
shouldbe noticed: 
> - Use of AccessExclusiveLock when swapping relfilenodes of an index and its concurrent entry instead of
ShareUpdateExclusiveLockfor safety. At the limit of my understanding, that's the consensus reached until now. 
> - Cleanup of many comments and refresh of the documentation that was somewhat wrongly-formulated or shaped at some
places
> - Addition of support for autocommit off in psql for REINDEX [ TABLE | INDEX ] CONCURRENTLY
> - Some more code cleanup..
> I haven't been through the tab completion support for psql but looking at tab-completion.c this seems a bit tricky
withthe stuff related to CREATE INDEX CONCURRENTLY already present. Nothing huge though. 

Patch applies against current HEAD and builds, but I'm getting 37 failed tests (mostly parallel, but also misc and
WITH;results attached). Is that expeccted? 

+   <para>
+    In a concurrent index build, a new index whose storage will replace the one
+    to be rebuild is actually entered into the system catalogs in one
+    transaction, then two table scans occur in two more transactions. Once this
+    is performed, the old and fresh indexes are swapped by taking a lock
+    <literal>ACCESS EXCLUSIVE</>. Finally two additional transactions
+    are used to mark the concurrent index as not ready and then drop it.
+   </para>

The "mark the concurrent index" bit is rather confusing; it sounds like it's referring to the new index instead of the
old.Now that I've read the code I understand what's going on here between the concurrent index *entry* and the filenode
swap,but I don't think the docs make this sufficiently clear to users. 

How about something like this:

The following steps occur in a concurrent index build, each in a separate transaction. Note that if there are multiple
indexesto be rebuilt then each step loops through all the indexes we're rebuilding, using a separate transaction for
eachone. 

1. A new "temporary" index definition is entered into the catalog. This definition is only used to build the new index,
andwill be removed at the completion of the process. 
2. A first pass index build is done.
3. A second pass is performed to add tuples that were added while the first pass build was running.
4. pg_class.relfilenode for the existing index definition and the "temporary" definition are swapped. This means that
theexisting index definition now uses the index data that we stored during the build, and the "temporary" definition is
usingthe old index data. 
5. The "temporary" index definition is marked as dead.
6. The "temporary" index definition and it's data (which is now the data for the old index) are dropped.


+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.

Last bit presumably should be "on which the index is based".


+    /* Build the list of column names, necessary for index_create */
Instead of all this work wouldn't it be easier to create a version of index_create/ConstructTupleDescriptor that will
usethe IndexInfo for the old index? ISTM index_concurrent_create() is doing a heck of a lot of work to marshal data
intoone form just to have it get marshaled yet again. Worst case, if we do have to play this game, there should be a
stand-alonefunction to get the columns/expressions for an existing index; you're duplicating a lot of code from
pg_get_indexdef_worker().

index_concurrent_swap(): Perhaps it'd be better to create index_concurrent_swap_setup() and
index_concurrent_swap_cleanup()and refactor the duplicated code out... the actual function would then become: 


ReindexRelationConcurrently()

+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each step of REINDEX
+ * CONCURRENTLY is done in parallel with all the table's indexes as well as
+ * its dependent toast indexes.
This comment is a bit misleading; we're not actually doing anything in parallel, right? AFAICT index_concurrent_build
isgoing to block while each index is built the first time. 

+     * relkind is an index, this index itself will be rebuilt. The locks taken
+     * parent relations and involved indexes are kept until this transaction
+     * is committed to protect against schema changes that might occur until
+     * the session lock is taken on each relation.

This comment is a bit unclear to me... at minimum I think it should be "* on parent relations" instead of "* parent
relations",but I think it needs to elaborate on why/when we're also taking session level locks. 

I also wordsmithed this comment a bit...
     * Here begins the process for concurrently rebuilding the index entries.
     * We need first to create an index which is based on the same data
     * as the former index except that it will be only registered in catalogs
     * and will be built later. It is possible to perform all the operations
     * on all the indexes at the same time for a parent relation including
     * indexes for its toast relation.

and this one...
     * During this phase the concurrent indexes catch up with any new tuples that
     * were created during the previous phase.
     *
     * We once again wait until no transaction can have the table open with
     * the index marked as read-only for updates. Each index validation is done
     * in a separate transaction to minimize how long we hold an open transaction.

+     * a different valid status to avoid an implosion in the number of indexes
+     * a parent relation could have if this operation step fails multiple times
+     * in a row due to a reason or another.

I'd change that to "explosion in the number of indexes a parent relation could have if this operation fails."

Phase 4, 5 and 6 are rather confusing if you don't understand that each "concurrent index" entry is meant to be thrown
away.I think the Phase 4 comment should elaborate on that. 

The comment in check_exclusion_constraint() is good; shouldn't the related comment on this line in index_create()
mentionthat check_exclusion_constraint() needs to be changed if we ever support concurrent builds of exclusion indexes? 
    if (concurrent && is_exclusion && !is_reindex)


--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment

pgsql-hackers by date:
From: Robert Haas
Date: 29 October 2014, 01:55:02
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions
From: Jim Nasby
Date: 29 October 2014, 02:07:52
Subject: Re: Trailing comma support in SELECT statements
Re: REINDEX CONCURRENTLY 2.0 - Mailing list pgsql-hackers

Attachment

Previous

Next