Thread: REINDEX CONCURRENTLY 2.0

REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

15 November 2013, 02:40:30

Hi all,

Please find attached updated patches for the support of REINDEX
CONCURRENTLY, renamed 2.0 for the occasion:
- 20131114_1_index_drop_comments.patch, patch that updates some
comments in index_drop. This updates only a couple of comments in
index_drop but has not been committed yet. It should be IMO...
- 20131114_2_WaitForOldsnapshots_refactor.patch, a refactoring patch
providing a single API that can be used to wait for old snapshots
- 20131114_3_reindex_concurrently.patch, providing the core feature.
Patch 3 needs to have patch 2 applied first. Regression tests,
isolation tests and documentation are included with the patch.

This is the continuation of the previous thread that finished here:
http://www.postgresql.org/message-id/CAB7nPqS+WYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw@mail.gmail.com

This patch has been added for this commit fest.
Regards,
--
Michael

Attachment

Re: REINDEX CONCURRENTLY 2.0

From

Peter Eisentraut

Date:

15 November 2013, 18:55:47

On 11/14/13, 9:40 PM, Michael Paquier wrote:
> Hi all,
> 
> Please find attached updated patches for the support of REINDEX
> CONCURRENTLY, renamed 2.0 for the occasion:
> - 20131114_1_index_drop_comments.patch, patch that updates some
> comments in index_drop. This updates only a couple of comments in
> index_drop but has not been committed yet. It should be IMO...
> - 20131114_2_WaitForOldsnapshots_refactor.patch, a refactoring patch
> providing a single API that can be used to wait for old snapshots
> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> Patch 3 needs to have patch 2 applied first. Regression tests,
> isolation tests and documentation are included with the patch.

The third patch needs to be rebased, because of a conflict in index.c.

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

15 November 2013, 20:09:17

Hi,

On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> Patch 3 needs to have patch 2 applied first. Regression tests,
> isolation tests and documentation are included with the patch.

Have you addressed my concurrency concerns from the last version?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

18 November 2013, 10:52:44

On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
>> - 20131114_3_reindex_concurrently.patch, providing the core feature.
>> Patch 3 needs to have patch 2 applied first. Regression tests,
>> isolation tests and documentation are included with the patch.
>
> Have you addressed my concurrency concerns from the last version?
I have added documentation in the patch with a better explanation
about why those choices of implementation are made.
Thanks,
--
Michael

Attachment

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

18 November 2013, 12:20:04

On 2013-11-18 19:52:29 +0900, Michael Paquier wrote:
> On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
> >> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> >> Patch 3 needs to have patch 2 applied first. Regression tests,
> >> isolation tests and documentation are included with the patch.
> >
> > Have you addressed my concurrency concerns from the last version?
> I have added documentation in the patch with a better explanation
> about why those choices of implementation are made.

The dropping still isn't safe:
After phase 4 we are in the state:
old index: valid, live, !isdead
new index: !valid, live, !isdead
Then you do a index_concurrent_set_dead() from that state on in phase 5.
There you do WaitForLockers(locktag, AccessExclusiveLock) before
index_set_state_flags(INDEX_DROP_SET_DEAD).
That's not sufficient.

Consider what happens with the following sequence:
1) WaitForLockers(locktag, AccessExclusiveLock)  -> GetLockConflicts() => virtualxact 1  -> VirtualXactLock(1)
2) virtualxact 2 starts, opens the *old* index since it's currently the  only valid one.
3) virtualxact 1 finishes
4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD)
5) another transaction (vxid 3) starts inserting data into the relation, updates  only the new index, the old index is
dead
6) vxid 2 inserts data, updates only the old index. Since it had the  index already open the cache invalidations won't
beprocessed.

Now the indexes are out of sync. There's entries only in the old index
and there's entries only in the new index. Not good.

I hate to repeat myself, but you really need to follow the current
protocol for concurrently dropping indexes. Which involves *first*
marking the index as invalid so it won't be used for querying anymore,
then wait for everyone possibly still seing that entry to finish, and
only *after* that mark the index as dead. You cannot argue away
correctness concerns with potential deadlocks.

c.f. http://www.postgresql.org/message-id/20130926103400.GA2471420@alap2.anarazel.de

I am also still unconvinced that the logic in index_concurrent_swap() is
correct. It very much needs to explain why no backend can see values
that are inconsistent. E.g. what prevents a backend thinking the old and
new indexes have the same relfilenode? MVCC snapshots don't seem to
protect you against that.
I am not sure there's a problem, but there certainly needs to more
comments explaining why there are none.

Something like the following might be possible:

Backend 1: start reindex concurrently, till phase 4
Backend 2: ExecOpenIndices()          -> RelationGetIndexList (that list is consistent due to mvcc snapshots!)
Backend 2: -> index_open(old_index) (old relfilenode)
Backend 1: index_concurrent_swap()          -> CommitTransaction()          -> ProcArrayEndTransaction() (changes
visibleto others henceforth!)

Backend 2: -> index_open(new_index)          => no cache invalidations yet, gets the old relfilenode
Backend 2: ExecInsertIndexTuples()          => updates the same relation twice, corruptf
Backend 1: stil. in CommitTransaction()         -> AtEOXact_Inval() sends out invalidations

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Alvaro Herrera

Date:

03 January 2014, 16:24:40

Michael Paquier escribió:
> Hi all,
> 
> Please find attached updated patches for the support of REINDEX
> CONCURRENTLY, renamed 2.0 for the occasion:
> - 20131114_1_index_drop_comments.patch, patch that updates some
> comments in index_drop. This updates only a couple of comments in
> index_drop but has not been committed yet. It should be IMO...

Pushed this one, thanks.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: REINDEX CONCURRENTLY 2.0

From

Jim Nasby

Date:

10 January 2014, 00:59:35

Sorry for the lateness of this...

On 11/14/13, 8:40 PM, Michael Paquier wrote:
> +    /*
> +     * Phase 4 of REINDEX CONCURRENTLY
> +     *
> +     * Now that the concurrent indexes have been validated could be used,
> +     * we need to swap each concurrent index with its corresponding old index.
> +     * Note that the concurrent index used for swaping is not marked as valid
> +     * because we need to keep the former index and the concurrent index with
> +     * a different valid status to avoid an implosion in the number of indexes
> +     * a parent relation could have if this operation fails multiple times in
> +     * a row due to a reason or another. Note that we already know thanks to
> +     * validation step that
> +     */
> +

Was there supposed to be more to that comment?

In the loop right below it...

+    /* Swap the indexes and mark the indexes that have the old data as invalid */
+    forboth(lc, indexIds, lc2, concurrentIndexIds)
...
+        CacheInvalidateRelcacheByRelid(relOid);

Do we actually need to invalidate the cache on each case? Is it because we're grabbing a new transaction each time
through?
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

21 January 2014, 13:12:43

Hi,

Thanks for your comments.

On Fri, Jan 10, 2014 at 9:59 AM, Jim Nasby <jim@nasby.net> wrote:
> Sorry for the lateness of this...
>
> On 11/14/13, 8:40 PM, Michael Paquier wrote:
>>
>> +       /*
>> +        * Phase 4 of REINDEX CONCURRENTLY
>> +        *
>> +        * Now that the concurrent indexes have been validated could be
>> used,
>> +        * we need to swap each concurrent index with its corresponding
>> old index.
>> +        * Note that the concurrent index used for swaping is not marked
>> as valid
>> +        * because we need to keep the former index and the concurrent
>> index with
>> +        * a different valid status to avoid an implosion in the number of
>> indexes
>> +        * a parent relation could have if this operation fails multiple
>> times in
>> +        * a row due to a reason or another. Note that we already know
>> thanks to
>> +        * validation step that
>> +        */
>> +
>
>
> Was there supposed to be more to that comment?
Not really, it seems that this chunk came out after writing multiple
successive versions of this patch.

> In the loop right below it...
>
> +       /* Swap the indexes and mark the indexes that have the old data as
> invalid */
> +       forboth(lc, indexIds, lc2, concurrentIndexIds)
> ...
> +               CacheInvalidateRelcacheByRelid(relOid);
>
> Do we actually need to invalidate the cache on each case? Is it because
> we're grabbing a new transaction each time through?
This is to force a refresh of the cached plans that have been using
the old index before transaction of step 4 began.

I have realigned this patch with latest head (d2458e3)... In case
someone is interested at some point...

Regards,
--
Michael

Attachment

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

27 August 2014, 02:01:10

On Tue, Jan 21, 2014 at 10:12 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> I have realigned this patch with latest head (d2458e3)... In case
> someone is interested at some point...
Attached is a patch for REINDEX CONCURRENTLY rebased on HEAD
(d7938a4), as some people are showing interest in it by reading recent
discussions. Patch compiles and passes regression as well as isolation
tests..
--
Michael

Attachment

20140827_Support-for-REINDEX-CONCURRENTLY.patch

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

27 August 2014, 06:41:29

On 2014-08-27 11:00:56 +0900, Michael Paquier wrote:
> On Tue, Jan 21, 2014 at 10:12 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > I have realigned this patch with latest head (d2458e3)... In case
> > someone is interested at some point...
> Attached is a patch for REINDEX CONCURRENTLY rebased on HEAD
> (d7938a4), as some people are showing interest in it by reading recent
> discussions. Patch compiles and passes regression as well as isolation
> tests..

Can you add it to the next CF? I'll try to look earlier, but can't
promise anything.

I very much would like this to get committed in some form or another.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

27 August 2014, 06:53:45

On Wed, Aug 27, 2014 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Can you add it to the next CF? I'll try to look earlier, but can't
> promise anything.
>
> I very much would like this to get committed in some form or another.
Added it here to keep track of it:
https://commitfest.postgresql.org/action/patch_view?id=1563
Regards,
-- 
Michael

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

01 October 2014, 07:00:30

On Wed, Aug 27, 2014 at 3:53 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Wed, Aug 27, 2014 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Can you add it to the next CF? I'll try to look earlier, but can't
> promise anything.
>
> I very much would like this to get committed in some form or another.
Added it here to keep track of it:
https://commitfest.postgresql.org/action/patch_view?id=1563

Attached is a fairly-refreshed patch that should be used as a base for the next commit fest. The following changes should be noticed:

- Use of AccessExclusiveLock when swapping relfilenodes of an index and its concurrent entry instead of ShareUpdateExclusiveLock for safety. At the limit of my understanding, that's the consensus reached until now.

- Cleanup of many comments and refresh of the documentation that was somewhat wrongly-formulated or shaped at some places

- Addition of support for autocommit off in psql for REINDEX [ TABLE | INDEX ] CONCURRENTLY

- Some more code cleanup..

I haven't been through the tab completion support for psql but looking at tab-completion.c this seems a bit tricky with the stuff related to CREATE INDEX CONCURRENTLY already present. Nothing huge though.

Regards,

--
Michael

Attachment

20141001_reindex_concurrently.patch

Re: REINDEX CONCURRENTLY 2.0

From

Jim Nasby

Date:

28 October 2014, 23:00:28

On 10/1/14, 2:00 AM, Michael Paquier wrote:
> On Wed, Aug 27, 2014 at 3:53 PM, Michael Paquier <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>>
wrote:
>
>     On Wed, Aug 27, 2014 at 3:41 PM, Andres Freund <andres@2ndquadrant.com <mailto:andres@2ndquadrant.com>> wrote:
>     > Can you add it to the next CF? I'll try to look earlier, but can't
>     > promise anything.
>     >
>     > I very much would like this to get committed in some form or another.
>     Added it here to keep track of it:
>     https://commitfest.postgresql.org/action/patch_view?id=1563
>
> Attached is a fairly-refreshed patch that should be used as a base for the next commit fest. The following changes
shouldbe noticed: 
> - Use of AccessExclusiveLock when swapping relfilenodes of an index and its concurrent entry instead of
ShareUpdateExclusiveLockfor safety. At the limit of my understanding, that's the consensus reached until now. 
> - Cleanup of many comments and refresh of the documentation that was somewhat wrongly-formulated or shaped at some
places
> - Addition of support for autocommit off in psql for REINDEX [ TABLE | INDEX ] CONCURRENTLY
> - Some more code cleanup..
> I haven't been through the tab completion support for psql but looking at tab-completion.c this seems a bit tricky
withthe stuff related to CREATE INDEX CONCURRENTLY already present. Nothing huge though. 

Patch applies against current HEAD and builds, but I'm getting 37 failed tests (mostly parallel, but also misc and
WITH;results attached). Is that expeccted? 

+   <para>
+    In a concurrent index build, a new index whose storage will replace the one
+    to be rebuild is actually entered into the system catalogs in one
+    transaction, then two table scans occur in two more transactions. Once this
+    is performed, the old and fresh indexes are swapped by taking a lock
+    <literal>ACCESS EXCLUSIVE</>. Finally two additional transactions
+    are used to mark the concurrent index as not ready and then drop it.
+   </para>

The "mark the concurrent index" bit is rather confusing; it sounds like it's referring to the new index instead of the
old.Now that I've read the code I understand what's going on here between the concurrent index *entry* and the filenode
swap,but I don't think the docs make this sufficiently clear to users. 

How about something like this:

The following steps occur in a concurrent index build, each in a separate transaction. Note that if there are multiple
indexesto be rebuilt then each step loops through all the indexes we're rebuilding, using a separate transaction for
eachone. 

1. A new "temporary" index definition is entered into the catalog. This definition is only used to build the new index,
andwill be removed at the completion of the process. 
2. A first pass index build is done.
3. A second pass is performed to add tuples that were added while the first pass build was running.
4. pg_class.relfilenode for the existing index definition and the "temporary" definition are swapped. This means that
theexisting index definition now uses the index data that we stored during the build, and the "temporary" definition is
usingthe old index data. 
5. The "temporary" index definition is marked as dead.
6. The "temporary" index definition and it's data (which is now the data for the old index) are dropped.


+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.

Last bit presumably should be "on which the index is based".


+    /* Build the list of column names, necessary for index_create */
Instead of all this work wouldn't it be easier to create a version of index_create/ConstructTupleDescriptor that will
usethe IndexInfo for the old index? ISTM index_concurrent_create() is doing a heck of a lot of work to marshal data
intoone form just to have it get marshaled yet again. Worst case, if we do have to play this game, there should be a
stand-alonefunction to get the columns/expressions for an existing index; you're duplicating a lot of code from
pg_get_indexdef_worker().

index_concurrent_swap(): Perhaps it'd be better to create index_concurrent_swap_setup() and
index_concurrent_swap_cleanup()and refactor the duplicated code out... the actual function would then become: 


ReindexRelationConcurrently()

+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each step of REINDEX
+ * CONCURRENTLY is done in parallel with all the table's indexes as well as
+ * its dependent toast indexes.
This comment is a bit misleading; we're not actually doing anything in parallel, right? AFAICT index_concurrent_build
isgoing to block while each index is built the first time. 

+     * relkind is an index, this index itself will be rebuilt. The locks taken
+     * parent relations and involved indexes are kept until this transaction
+     * is committed to protect against schema changes that might occur until
+     * the session lock is taken on each relation.

This comment is a bit unclear to me... at minimum I think it should be "* on parent relations" instead of "* parent
relations",but I think it needs to elaborate on why/when we're also taking session level locks. 

I also wordsmithed this comment a bit...
     * Here begins the process for concurrently rebuilding the index entries.
     * We need first to create an index which is based on the same data
     * as the former index except that it will be only registered in catalogs
     * and will be built later. It is possible to perform all the operations
     * on all the indexes at the same time for a parent relation including
     * indexes for its toast relation.

and this one...
     * During this phase the concurrent indexes catch up with any new tuples that
     * were created during the previous phase.
     *
     * We once again wait until no transaction can have the table open with
     * the index marked as read-only for updates. Each index validation is done
     * in a separate transaction to minimize how long we hold an open transaction.

+     * a different valid status to avoid an implosion in the number of indexes
+     * a parent relation could have if this operation step fails multiple times
+     * in a row due to a reason or another.

I'd change that to "explosion in the number of indexes a parent relation could have if this operation fails."

Phase 4, 5 and 6 are rather confusing if you don't understand that each "concurrent index" entry is meant to be thrown
away.I think the Phase 4 comment should elaborate on that. 

The comment in check_exclusion_constraint() is good; shouldn't the related comment on this line in index_create()
mentionthat check_exclusion_constraint() needs to be changed if we ever support concurrent builds of exclusion indexes? 
    if (concurrent && is_exclusion && !is_reindex)


--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Attachment

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

30 October 2014, 08:19:22

Thanks for your input, Jim!

On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Patch applies against current HEAD and builds, but I'm getting 37 failed
> tests (mostly parallel, but also misc and WITH; results attached). Is that
> expected?
This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing inclusion of ruleutils.h in index.c.

> The "mark the concurrent index" bit is rather confusing; it sounds like it's
> referring to the new index instead of the old. Now that I've read the code I
> understand what's going on here between the concurrent index *entry* and the
> filenode swap, but I don't think the docs make this sufficiently clear to
> users.
>
> How about something like this:
>
> The following steps occur in a concurrent index build, each in a separate
> transaction. Note that if there are multiple indexes to be rebuilt then each
> step loops through all the indexes we're rebuilding, using a separate
> transaction for each one.
>

> 1. [blah]

Definitely a good idea! I took your text and made it more precise, listing the actions done for each step, the pg_index flags switched, using <orderedlist> to make the list of steps described in a separate paragraph more exhaustive for the user. At the same time I reworked the docs removing a part that was somewhat duplicated about dealing with the constraints having invalid index entries and how to drop them.

> + * index_concurrent_create
> + *
> + * Create an index based on the given one that will be used for concurrent
> + * operations. The index is inserted into catalogs and needs to be built
> later
> + * on. This is called during concurrent index processing. The heap relation
> + * on which is based the index needs to be closed by the caller.
>
> Last bit presumably should be "on which the index is based".
What about "Create a concurrent index based on the definition of the one provided by caller"?

> + /* Build the list of column names, necessary for index_create */
> Instead of all this work wouldn't it be easier to create a version of
> index_create/ConstructTupleDescriptor that will use the IndexInfo for the
> old index? ISTM index_concurrent_create() is doing a heck of a lot of work
> to marshal data into one form just to have it get marshaled yet again. Worst
> case, if we do have to play this game, there should be a stand-alone
> function to get the columns/expressions for an existing index; you're
> duplicating a lot of code from pg_get_indexdef_worker().

Yes, this definitely sucks and the approach creating a function to get all the column names is not productive as well. Then let's define an additional argument in index_create to pass a potential TupleDesc instead of this whole wart. I noticed as well that we need to actually reset attcacheoff to be able to use a fresh version of the tuple descriptor of the old index. I added a small API for this purpose in tupdesc.h called ResetTupleDescCache. Would it make sense instead to extend CreateTupleDescCopyConstr or CreateTupleDescCopy with a boolean flag?

> index_concurrent_swap(): Perhaps it'd be better to create
> index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and
> refactor the duplicated code out... the actual function would then become:

This sentence is not finished :) IMO, index_concurrent_swap looks good as is, taking as arguments the index and its concurrent entry, and swapping their relfilenode after taking AccessExclusiveLock that will be hold until the end of this transaction.

> ReindexRelationConcurrently()
>
> + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
> + * either an index or a table. If a table is specified, each step of
> REINDEX
> + * CONCURRENTLY is done in parallel with all the table's indexes as well as
> + * its dependent toast indexes.
> This comment is a bit misleading; we're not actually doing anything in
> parallel, right? AFAICT index_concurrent_build is going to block while each
> index is built the first time.
Yes, parallel may be misleading. What is meant here is that each step of the process is done one by one on all the valid indexes a table may have.

> + * relkind is an index, this index itself will be rebuilt. The locks
> taken
> + * parent relations and involved indexes are kept until this
> transaction
> + * is committed to protect against schema changes that might occur
> until
> + * the session lock is taken on each relation.
>
> This comment is a bit unclear to me... at minimum I think it should be "* on
> parent relations" instead of "* parent relations", but I think it needs to
> elaborate on why/when we're also taking session level locks.
Hum, done as follows:
@@ -896,9 +896,11 @@ ReindexRelationConcurrently(Oid relationOid)
* If the relkind of given relation Oid is a table, all its valid indexes
* will be rebuilt, including its associated toast table indexes. If
* relkind is an index, this index itself will be rebuilt. The locks taken
- * parent relations and involved indexes are kept until this transaction
+ * on parent relations and involved indexes are kept until this transaction
* is committed to protect against schema changes that might occur until
- * the session lock is taken on each relation.
+ * the session lock is taken on each relation, session lock used to
+ * similarly protect from any schema change that could happen within the
+ * multiple transactions that are used during this process.
*/

> I also wordsmithed this comment a bit...
> * Here begins the process for concurrently rebuilding the index
> and this one...
> * During this phase the concurrent indexes catch up with any new
Slight differences indeed. Thanks and included.

> I'd change that to "explosion in the number of indexes a parent relation
> could have if this operation fails."
Well, implosion was more... I don't recall my state of mind when writing that. So changed the way you recommend.

> Phase 4, 5 and 6 are rather confusing if you don't understand that each
> "concurrent index" entry is meant to be thrown away. I think the Phase 4
> comment should elaborate on that.
OK, done.

> The comment in check_exclusion_constraint() is good; shouldn't the related
> comment on this line in index_create() mention that
> check_exclusion_constraint() needs to be changed if we ever support
> concurrent builds of exclusion indexes?
>
> if (concurrent && is_exclusion && !is_reindex)
OK, what about that then:
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway. If support for exclusion constraints is added in the future,
+ * the check similar to this one in check_exclusion_constraint should as
+ * well be changed accordingly.

Updated patch is attached.

Thanks again.

Regards,
--
Michael

Attachment

20141030_reindex_concurrently_3_v2.patch

Re: REINDEX CONCURRENTLY 2.0

From

Jim Nasby

Date:

30 October 2014, 21:45:29

On 10/30/14, 3:19 AM, Michael Paquier wrote:
> Thanks for your input, Jim!
>
> On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>> wrote:
>  > Patch applies against current HEAD and builds, but I'm getting 37 failed
>  > tests (mostly parallel, but also misc and WITH; results attached). Is that
>  > expected?
> This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing
inclusionof ruleutils.h in index.c.
 

>  > The "mark the concurrent index" bit is rather confusing; it sounds like it's
>  > referring to the new index instead of the old. Now that I've read the code I
>  > understand what's going on here between the concurrent index *entry* and the
>  > filenode swap, but I don't think the docs make this sufficiently clear to
>  > users.
>  >
>  > How about something like this:
>  >
>  > The following steps occur in a concurrent index build, each in a separate
>  > transaction. Note that if there are multiple indexes to be rebuilt then each
>  > step loops through all the indexes we're rebuilding, using a separate
>  > transaction for each one.
>  >
>  > 1. [blah]
> Definitely a good idea! I took your text and made it more precise, listing the actions done for each step, the
pg_indexflags switched, using <orderedlist> to make the list of steps described in a separate paragraph more exhaustive
forthe user. At the same time I reworked the docs removing a part that was somewhat duplicated about dealing with the
constraintshaving invalid index entries and how to drop them.
 

Awesome! Just a few items here:

+       Then a second pass is performed to add tuples that were added while
+       the first pass build was running. One the validation of the index

s/One the/Once the/

>  > + * index_concurrent_create
>  > + *
>  > + * Create an index based on the given one that will be used for concurrent
>  > + * operations. The index is inserted into catalogs and needs to be built
>  > later
>  > + * on. This is called during concurrent index processing. The heap relation
>  > + * on which is based the index needs to be closed by the caller.
>  >
>  > Last bit presumably should be "on which the index is based".
> What about "Create a concurrent index based on the definition of the one provided by caller"?

That's good too, but my comment was on the last sentence, not the first.

>  > +       /* Build the list of column names, necessary for index_create */
>  > Instead of all this work wouldn't it be easier to create a version of
>  > index_create/ConstructTupleDescriptor that will use the IndexInfo for the
>  > old index? ISTM index_concurrent_create() is doing a heck of a lot of work
>  > to marshal data into one form just to have it get marshaled yet again. Worst
>  > case, if we do have to play this game, there should be a stand-alone
>  > function to get the columns/expressions for an existing index; you're
>  > duplicating a lot of code from pg_get_indexdef_worker().
> Yes, this definitely sucks and the approach creating a function to get all the column names is not productive as
well.Then let's define an additional argument in index_create to pass a potential TupleDesc instead of this whole wart.
Inoticed as well that we need to actually reset attcacheoff to be able to use a fresh version of the tuple descriptor
ofthe old index. I added a small API for this purpose in tupdesc.h called ResetTupleDescCache. Would it make sense
insteadto extend CreateTupleDescCopyConstr or CreateTupleDescCopy with a boolean flag?
 

Perhaps there'd be other places that would want to reset the stats, so I lean slightly that direction.

The comment at the beginning of index_create needs to be modified to mention tupdesc and especially that setting
tupdescover-rides indexColNames.
 

>  > index_concurrent_swap(): Perhaps it'd be better to create
>  > index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and
>  > refactor the duplicated code out... the actual function would then become:
> This sentence is not finished :) IMO, index_concurrent_swap looks good as is, taking as arguments the index and its
concurrententry, and swapping their relfilenode after taking AccessExclusiveLock that will be hold until the end of
thistransaction.
 

Fair enough.

>  > ReindexRelationConcurrently()
>  >
>  > + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
>  > + * either an index or a table. If a table is specified, each step of
>  > REINDEX
>  > + * CONCURRENTLY is done in parallel with all the table's indexes as well as
>  > + * its dependent toast indexes.
>  > This comment is a bit misleading; we're not actually doing anything in
>  > parallel, right? AFAICT index_concurrent_build is going to block while each
>  > index is built the first time.
> Yes, parallel may be misleading. What is meant here is that each step of the process is done one by one on all the
validindexes a table may have.
 

New comment looks good.

>  > +        * relkind is an index, this index itself will be rebuilt. The locks
>  > taken
>  > +        * parent relations and involved indexes are kept until this
>  > transaction
>  > +        * is committed to protect against schema changes that might occur
>  > until
>  > +        * the session lock is taken on each relation.
>  >
>  > This comment is a bit unclear to me... at minimum I think it should be "* on
>  > parent relations" instead of "* parent relations", but I think it needs to
>  > elaborate on why/when we're also taking session level locks.
> Hum, done as follows:
> @@ -896,9 +896,11 @@ ReindexRelationConcurrently(Oid relationOid)
>           * If the relkind of given relation Oid is a table, all its valid indexes
>           * will be rebuilt, including its associated toast table indexes. If
>           * relkind is an index, this index itself will be rebuilt. The locks taken
> -        * parent relations and involved indexes are kept until this transaction
> +        * on parent relations and involved indexes are kept until this transaction
>           * is committed to protect against schema changes that might occur until
> -        * the session lock is taken on each relation.
> +        * the session lock is taken on each relation, session lock used to
> +        * similarly protect from any schema change that could happen within the
> +        * multiple transactions that are used during this process.
>           */

Cool.


> OK, what about that then:
>          /*
> -        * This case is currently not supported, but there's no way to ask for it
> -        * in the grammar anyway, so it can't happen.
> +        * This case is currently only supported during a concurrent index
> +        * rebuild, but there is no way to ask for it in the grammar otherwise
> +        * anyway. If support for exclusion constraints is added in the future,
> +        * the check similar to this one in check_exclusion_constraint should as
> +        * well be changed accordingly.
>
> Updated patch is attached.

Works for me.

Keep in mind I'm not super familiar with the guts of index creation, so it'd be good for someone else to look at that
bit(especially index_concurrent_create and ReindexRelationConcurrently).
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: REINDEX CONCURRENTLY 2.0

From

Jim Nasby

Date:

30 October 2014, 21:45:45

On 10/30/14, 3:19 AM, Michael Paquier wrote:
> On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>> wrote:
>  > Patch applies against current HEAD and builds, but I'm getting 37 failed
>  > tests (mostly parallel, but also misc and WITH; results attached). Is that
>  > expected?
> This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing
inclusionof ruleutils.h in index.c.
 

Sorry, forgot to mention patch now passes make check cleanly.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

06 November 2014, 04:49:25

On Thu, Oct 30, 2014 at 5:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Updated patch is attached.
Please find attached an updated patch with the following things changed:
- Addition of tab completion in psql for all new commands
- Addition of a call to WaitForLockers in index_concurrent_swap to
ensure that there are no running transactions on the parent table
running before exclusive locks are taken on the index and its
concurrent entry. Previous patch versions created deadlocks because of
that, issue spotted by the isolation tests integrated in the patch.
- Isolation tests for reindex concurrently are re-enabled by default.
Regards,
--
Michael

Attachment

20141106_reindex_concurrently_3_v3.patch

Re: REINDEX CONCURRENTLY 2.0

From

Peter Eisentraut

Date:

06 November 2014, 14:51:11

On 10/1/14 3:00 AM, Michael Paquier wrote:
> - Use of AccessExclusiveLock when swapping relfilenodes of an index and
> its concurrent entry instead of ShareUpdateExclusiveLock for safety. At
> the limit of my understanding, that's the consensus reached until now.

I'm very curious about this point.  I looked through all the previous
discussions, and the only time I saw this mentioned was at the very
beginning when it was said that we could review the patch while ignoring
this issue and fix it later with MVCC catalog access.  Then it got very
technical, but it was never explicitly concluded whether it was possible
to fix this or not.

Also, in the thread "Concurrently option for reindexdb" you pointed out
that requiring an exclusive lock isn't really concurrent and proposed an
option like --minimum-locks.

I will point out again that we specifically invented DROP INDEX
CONCURRENTLY because holding an exclusive lock even briefly isn't good
enough.

If REINDEX cannot work without an exclusive lock, we should invent some
other qualifier, like WITH FEWER LOCKS.  It's still useful, but we
shouldn't give people the idea that they can throw away their custom
CREATE INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY scripts.

Re: REINDEX CONCURRENTLY 2.0

From

Jeff Janes

Date:

10 November 2014, 18:25:15

On Wed, Nov 5, 2014 at 8:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Thu, Oct 30, 2014 at 5:19 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Updated patch is attached.
Please find attached an updated patch with the following things changed:
- Addition of tab completion in psql for all new commands
- Addition of a call to WaitForLockers in index_concurrent_swap to
ensure that there are no running transactions on the parent table
running before exclusive locks are taken on the index and its
concurrent entry. Previous patch versions created deadlocks because of
that, issue spotted by the isolation tests integrated in the patch.
- Isolation tests for reindex concurrently are re-enabled by default.
Regards,

It looks like this needs another rebase, I get failures on index.c, toasting.c, indexcmds.c, and index.h

Thanks,

Jeff

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

11 November 2014, 04:52:31

On Tue, Nov 11, 2014 at 3:24 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Wed, Nov 5, 2014 at 8:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Thu, Oct 30, 2014 at 5:19 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > Updated patch is attached.
>> Please find attached an updated patch with the following things changed:
>> - Addition of tab completion in psql for all new commands
>> - Addition of a call to WaitForLockers in index_concurrent_swap to
>> ensure that there are no running transactions on the parent table
>> running before exclusive locks are taken on the index and its
>> concurrent entry. Previous patch versions created deadlocks because of
>> that, issue spotted by the isolation tests integrated in the patch.
>> - Isolation tests for reindex concurrently are re-enabled by default.
>> Regards,
>
>
> It looks like this needs another rebase, I get failures on index.c, toasting.c, indexcmds.c, and index.h

Indeed. There are some conflicts created by the recent modification of
index_create. Here is a rebased patch.
--
Michael

Attachment

20141110_reindex_concurrently_3_v4.patch

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

12 November 2014, 21:11:01

On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> If REINDEX cannot work without an exclusive lock, we should invent some
> other qualifier, like WITH FEWER LOCKS.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

12 November 2014, 21:12:05

On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> If REINDEX cannot work without an exclusive lock, we should invent some
>> other qualifier, like WITH FEWER LOCKS.
>
> What he said.

But more to the point .... why, precisely, can't this work without an
AccessExclusiveLock?  And can't we fix that instead of setting for
something clearly inferior?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

12 November 2014, 21:39:54

On 2014-11-12 16:11:58 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> >> If REINDEX cannot work without an exclusive lock, we should invent some
> >> other qualifier, like WITH FEWER LOCKS.
> >
> > What he said.

I'm unconvinced. A *short* exclusive lock (just to update two pg_class
row), still gives most of the benefits of CONCURRENTLY. Also, I do think
we can get rid of that period in the not too far away future.

> But more to the point .... why, precisely, can't this work without an
> AccessExclusiveLock?  And can't we fix that instead of setting for
> something clearly inferior?

It's nontrivial to fix, but I think we can fix it at some point. I just
think we should get the *major* part of the feature before investing
lots of time making it even better. There's *very* frequent questions
about having this. And people do really dangerous stuff (like manually
updating pg_class.relfilenode and such) to cope.

The problem is that it's very hard to avoid the wrong index's
relfilenode being used when swapping the relfilenodes between two
indexes.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

12 November 2014, 23:23:45

On Wed, Nov 12, 2014 at 4:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-12 16:11:58 -0500, Robert Haas wrote:
>> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> >> If REINDEX cannot work without an exclusive lock, we should invent some
>> >> other qualifier, like WITH FEWER LOCKS.
>> >
>> > What he said.
>
> I'm unconvinced. A *short* exclusive lock (just to update two pg_class
> row), still gives most of the benefits of CONCURRENTLY.

I am pretty doubtful about that.  It's still going to require you to
wait for all transactions to drain out of the table while new ones are
blocked from entering.  Which sucks.  Unless all of your transactions
are very short, but that's not necessarily typical.

> The problem is that it's very hard to avoid the wrong index's
> relfilenode being used when swapping the relfilenodes between two
> indexes.

How about storing both the old and new relfilenodes in the same pg_class entry?

1. Take a snapshot.
2. Index all the tuples in that snapshot.
3. Publish the new relfilenode to an additional pg_class column,
relnewfilenode or similar.
4. Wait until everyone can see step #3.
5. Rescan the table and add any missing tuples to the index.
6. Set some flag in pg_class to mark the relnewfilenode as active and
relfilenode as not to be used for queries.
7. Wait until everyone can see step #6.
8. Set some flag in pg_class to mark relfilenode as not even to be opened.
9. Wait until everyone can see step #8.
10. Drop old relfilenode.
11. Clean up by setting relfilenode = relnewfilenode, relfilenode = 0.

This is basically CREATE INDEX CONCURRENTLY (without the first step
where we out-wait people who might create now-invalid HOT chains,
because that can't arise with a REINDEX of an existing index) plus
DROP INDEX CONCURRENTLY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

13 November 2014, 00:31:26

On 2014-11-12 18:23:38 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 4:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-11-12 16:11:58 -0500, Robert Haas wrote:
> >> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> >> >> If REINDEX cannot work without an exclusive lock, we should invent some
> >> >> other qualifier, like WITH FEWER LOCKS.
> >> >
> >> > What he said.
> >
> > I'm unconvinced. A *short* exclusive lock (just to update two pg_class
> > row), still gives most of the benefits of CONCURRENTLY.
> 
> I am pretty doubtful about that.  It's still going to require you to
> wait for all transactions to drain out of the table while new ones are
> blocked from entering.  Which sucks.  Unless all of your transactions
> are very short, but that's not necessarily typical.

Yes, it sucks. But it beats not being able to reindex a relation with a
primary key (referenced by a fkey) without waiting several hours by a
couple magnitudes. And that's the current situation.

> > The problem is that it's very hard to avoid the wrong index's
> > relfilenode being used when swapping the relfilenodes between two
> > indexes.
> 
> How about storing both the old and new relfilenodes in the same pg_class entry?

That's quite a cool idea

[think a bit]

But I think it won't work realistically. We have a *lot* of
infrastructure that refers to indexes using it's primary key. I don't
think we want to touch all those places to also disambiguate on some
other factor. All the relevant APIs are either just passing around oids
or relcache entries.

There's also the problem that we'd really need two different pg_index
rows to make things work. Alternatively we can duplicate the three
relevant columns (indisready, indislive, indislive) in there for the
different filenodes. But that's not entirely pretty.

> 1. Take a snapshot.
> 2. Index all the tuples in that snapshot.
> 3. Publish the new relfilenode to an additional pg_class column,
> relnewfilenode or similar.
> 4. Wait until everyone can see step #3.

Here all backends need to update both indexes, right? And all the
indexing infrastructure can't deal with that without having separate
oids & relcache entries.

> 5. Rescan the table and add any missing tuples to the index.
> 6. Set some flag in pg_class to mark the relnewfilenode as active and
> relfilenode as not to be used for queries.
> 7. Wait until everyone can see step #6.
> 8. Set some flag in pg_class to mark relfilenode as not even to be opened.
> 9. Wait until everyone can see step #8.
> 10. Drop old relfilenode.
> 11. Clean up by setting relfilenode = relnewfilenode, relfilenode = 0.

Even that one isn't trivial - how do you deal with the fact that
somebody looking at updating newrelfilenode might, in the midst of
processing, see newrelfilenode = 0?

I've earlier come up with a couple possible solutions, but I
unfortunately found holes in all of them. And if I can find holes in
them, there surely are more :(.

I don't recall what the problem with just swapping the names was - but
I'm pretty sure there was one... Hm. The index relation oids are
referred to by constraints and dependencies. That's somewhat
solvable. But I think there was something else as well...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Alvaro Herrera

Date:

13 November 2014, 01:25:49

Andres Freund wrote:
> On 2014-11-12 18:23:38 -0500, Robert Haas wrote:

> > > The problem is that it's very hard to avoid the wrong index's
> > > relfilenode being used when swapping the relfilenodes between two
> > > indexes.
> > 
> > How about storing both the old and new relfilenodes in the same pg_class entry?
> 
> That's quite a cool idea
> 
> [think a bit]
> 
> But I think it won't work realistically. We have a *lot* of
> infrastructure that refers to indexes using it's primary key.

Hmm, can we make the relmapper do this job instead of having another
pg_class column?  Essentially the same sketch Robert proposed, instead
we would initially set relfilenode=0 and have all onlookers use the
relmapper to obtain the correct relfilenode; switching to the new
relfilenode can be done atomically, and un-relmap the index once the
process is complete.

The difference from what Robert proposes is that the transient state is
known to cause failures for anyone not prepared to deal with it, so it
should be easy to spot what places need adjustment.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

13 November 2014, 01:26:56

On Thu, Nov 13, 2014 at 9:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I don't recall what the problem with just swapping the names was - but
> I'm pretty sure there was one... Hm. The index relation oids are
> referred to by constraints and dependencies. That's somewhat
> solvable. But I think there was something else as well...
The reason given 2 years ago for not using relname was the fast that
the oid of the index changes, and to it be refered by some pg_depend
entries:
http://www.postgresql.org/message-id/20121208133730.GA6422@awork2.anarazel.de
http://www.postgresql.org/message-id/12742.1354977643@sss.pgh.pa.us
Regards,
-- 
Michael

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

13 November 2014, 01:29:04

On Thu, Nov 13, 2014 at 10:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Nov 13, 2014 at 9:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't recall what the problem with just swapping the names was - but
>> I'm pretty sure there was one... Hm. The index relation oids are
>> referred to by constraints and dependencies. That's somewhat
>> solvable. But I think there was something else as well...
> The reason given 2 years ago for not using relname was the fast that
> the oid of the index changes, and to it be refered by some pg_depend
> entries:
Feel free to correct: "and that it could be referred".
-- 
Michael

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

13 November 2014, 16:41:26

On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> But I think it won't work realistically. We have a *lot* of
> infrastructure that refers to indexes using it's primary key. I don't
> think we want to touch all those places to also disambiguate on some
> other factor. All the relevant APIs are either just passing around oids
> or relcache entries.

I'm not quite following this.  The whole point is to AVOID having two
indexes.  You have one index which may at times have two sets of
physical storage.

> There's also the problem that we'd really need two different pg_index
> rows to make things work. Alternatively we can duplicate the three
> relevant columns (indisready, indislive, indislive) in there for the
> different filenodes. But that's not entirely pretty.

I think what you would probably end up with is a single "char" or int2
column that defines the state of the index.  Certain states would be
valid only when relnewfilenode != 0.

>> 1. Take a snapshot.
>> 2. Index all the tuples in that snapshot.
>> 3. Publish the new relfilenode to an additional pg_class column,
>> relnewfilenode or similar.
>> 4. Wait until everyone can see step #3.
>
> Here all backends need to update both indexes, right?

Yes.

> And all the
> indexing infrastructure can't deal with that without having separate
> oids & relcache entries.

Why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: REINDEX CONCURRENTLY 2.0

From

Peter Eisentraut

Date:

13 November 2014, 21:23:50

On 11/12/14 7:31 PM, Andres Freund wrote:
> Yes, it sucks. But it beats not being able to reindex a relation with a
> primary key (referenced by a fkey) without waiting several hours by a
> couple magnitudes. And that's the current situation.

That's fine, but we have, for better or worse, defined CONCURRENTLY :=
does not take exclusive locks.  Use a different adverb for an in-between
facility.

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

13 November 2014, 21:50:31

On November 13, 2014 10:23:41 PM CET, Peter Eisentraut <peter_e@gmx.net> wrote:
>On 11/12/14 7:31 PM, Andres Freund wrote:
>> Yes, it sucks. But it beats not being able to reindex a relation with
>a
>> primary key (referenced by a fkey) without waiting several hours by a
>> couple magnitudes. And that's the current situation.
>
>That's fine, but we have, for better or worse, defined CONCURRENTLY :=
>does not take exclusive locks.  Use a different adverb for an
>in-between
>facility.

I think that's not actually a service to our users. They'll have to adapt their scripts and knowledge when we get
aroundto the more concurrent version. What exactly CONCURRENTLY means is already not strictly defined and differs
betweenthe actions.

I'll note that DROP INDEX CONCURRENTLY actually already  internally acquires an AEL lock. Although it's a bit harder to
seethe consequences of that.

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund                       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: REINDEX CONCURRENTLY 2.0

From

Jim Nasby

Date:

14 November 2014, 08:04:20

On 11/13/14, 3:50 PM, Andres Freund wrote:
> On November 13, 2014 10:23:41 PM CET, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 11/12/14 7:31 PM, Andres Freund wrote:
>>> Yes, it sucks. But it beats not being able to reindex a relation with
>> a
>>> primary key (referenced by a fkey) without waiting several hours by a
>>> couple magnitudes. And that's the current situation.
>>
>> That's fine, but we have, for better or worse, defined CONCURRENTLY :=
>> does not take exclusive locks.  Use a different adverb for an
>> in-between
>> facility.
>
> I think that's not actually a service to our users. They'll have to adapt their scripts and knowledge when we get
aroundto the more concurrent version. What exactly CONCURRENTLY means is already not strictly defined and differs
betweenthe actions.

It also means that if we ever found a way to get rid of the exclusive lock we'd then have an inconsistency anyway. Or
we'dalso create REINDEX CONCURRENT at that time, and then have 2 command syntaxes to support.

> I'll note that DROP INDEX CONCURRENTLY actually already  internally acquires an AEL lock. Although it's a bit harder
tosee the consequences of that.

Having been responsible for a site where downtime was a 6 figure dollar amount per hour, I've spent a LOT of time
worryingabout lock problems. The really big issue here isn't grabbing an exclusive lock; it's grabbing one at some
randomtime when no one is there to actively monitor what's happening. (If you can't handle *any* exclusive locks, that
alsomeans you can never do an ALTER TABLE ADD COLUMN either.) With that in mind, would it be possible to set this up so
thatthe time-consuming process of building the new index file happens first, and then (optionally) some sort of DBA
actionis required to actually do the relfilenode swap? I realize that's not the most elegant solution, but it's WAY
betterthan this feature not hitting 9.5 and people having to hand-code a solution.

Possible syntax:
REINDEX CONCURRENTLY -- Does what current patch does
REINDEX CONCURRENT BUILD -- Builds new files
REINDEX CONCURRENT SWAP -- Swaps new files in

This suffers from the syntax problems I mentioned above, but at least this way it's all limited to one command, and it
probablyallows a lot more people to use it.

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

14 November 2014, 15:02:14

On 2014-11-14 02:04:00 -0600, Jim Nasby wrote:
> On 11/13/14, 3:50 PM, Andres Freund wrote:
> Having been responsible for a site where downtime was a 6 figure
> dollar amount per hour, I've spent a LOT of time worrying about lock
> problems. The really big issue here isn't grabbing an exclusive lock;
> it's grabbing one at some random time when no one is there to actively
> monitor what's happening. (If you can't handle *any* exclusive locks,
> that also means you can never do an ALTER TABLE ADD COLUMN either.)

> With that in mind, would it be possible to set this up so that the
> time-consuming process of building the new index file happens first,
> and then (optionally) some sort of DBA action is required to actually
> do the relfilenode swap? I realize that's not the most elegant
> solution, but it's WAY better than this feature not hitting 9.5 and
> people having to hand-code a solution.

I don't think having a multi step version of the feature and it not
making into 9.5 are synonymous. And I really don't want to make it even
more complex before we have the basic version in.

I think a split like your:

> Possible syntax:
> REINDEX CONCURRENTLY -- Does what current patch does
> REINDEX CONCURRENT BUILD -- Builds new files
> REINDEX CONCURRENT SWAP -- Swaps new files in

could make sense, but it's really an additional feature ontop.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

14 November 2014, 16:47:12

On 2014-11-13 11:41:18 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > But I think it won't work realistically. We have a *lot* of
> > infrastructure that refers to indexes using it's primary key. I don't
> > think we want to touch all those places to also disambiguate on some
> > other factor. All the relevant APIs are either just passing around oids
> > or relcache entries.
> 
> I'm not quite following this.  The whole point is to AVOID having two
> indexes.  You have one index which may at times have two sets of
> physical storage.

Right. But how are we going to refer to these different relfilenodes?
All the indexing infrastructure just uses oids and/or Relation pointers
to refer to the index. How would you hand down the knowledge which of
the relfilenodes is supposed to be used in some callchain?

There's ugly solutions like having a flag like 'bool
rd_useotherfilenode' inside struct RelationData, but even ignoring the
uglyness I don't think that'd work well - what if some function called
inside the index code again starts a index lookup?

I think I might just not getting your idea here?

> > And all the
> > indexing infrastructure can't deal with that without having separate
> > oids & relcache entries.

Hopefully explained above?

Greetings,

Andres Freund

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

17 November 2014, 16:06:14

On Fri, Nov 14, 2014 at 11:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-13 11:41:18 -0500, Robert Haas wrote:
>> On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > But I think it won't work realistically. We have a *lot* of
>> > infrastructure that refers to indexes using it's primary key. I don't
>> > think we want to touch all those places to also disambiguate on some
>> > other factor. All the relevant APIs are either just passing around oids
>> > or relcache entries.
>>
>> I'm not quite following this.  The whole point is to AVOID having two
>> indexes.  You have one index which may at times have two sets of
>> physical storage.
>
> Right. But how are we going to refer to these different relfilenodes?
> All the indexing infrastructure just uses oids and/or Relation pointers
> to refer to the index. How would you hand down the knowledge which of
> the relfilenodes is supposed to be used in some callchain?

If you've got a Relation, you don't need someone to tell you which
physical storage to use; you can figure that out for yourself by
looking at the Relation.  If you've got an OID, you're probably going
to go conjure up a Relation, and then you can do the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

19 November 2014, 05:53:58

On Thu, Nov 13, 2014 at 10:25 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2014-11-12 18:23:38 -0500, Robert Haas wrote:
>
>> > > The problem is that it's very hard to avoid the wrong index's
>> > > relfilenode being used when swapping the relfilenodes between two
>> > > indexes.
>> >
>> > How about storing both the old and new relfilenodes in the same pg_class entry?
>>
>> That's quite a cool idea
>>
>> [think a bit]
>>
>> But I think it won't work realistically. We have a *lot* of
>> infrastructure that refers to indexes using it's primary key.
>
> Hmm, can we make the relmapper do this job instead of having another
> pg_class column?  Essentially the same sketch Robert proposed, instead
> we would initially set relfilenode=0 and have all onlookers use the
> relmapper to obtain the correct relfilenode; switching to the new
> relfilenode can be done atomically, and un-relmap the index once the
> process is complete.
> The difference from what Robert proposes is that the transient state is
> known to cause failures for anyone not prepared to deal with it, so it
> should be easy to spot what places need adjustment.

How would the failure handling actually work? Would we need some extra
process to remove the extra relfilenodes? Note that in the current
patch the temporary concurrent entry is kept as INVALID all the time,
giving the user a path to remove them with DROP INDEX even in the case
of invalid toast indexes in catalog pg_toast.

Note that I am on the side of using the exclusive lock when swapping
relfilenodes for now in any case, that's what pg_repack does by
renaming the indexes, and people use it.
-- 
Michael

Re: REINDEX CONCURRENTLY 2.0

From

Oskari Saarenmaa

Date:

23 December 2014, 08:54:44

13.11.2014, 23:50, Andres Freund kirjoitti:
> On November 13, 2014 10:23:41 PM CET, Peter Eisentraut <peter_e@gmx.net> wrote:
>> On 11/12/14 7:31 PM, Andres Freund wrote:
>>> Yes, it sucks. But it beats not being able to reindex a relation with
>> a
>>> primary key (referenced by a fkey) without waiting several hours by a
>>> couple magnitudes. And that's the current situation.
>>
>> That's fine, but we have, for better or worse, defined CONCURRENTLY :=
>> does not take exclusive locks.  Use a different adverb for an
>> in-between
>> facility.
> 
> I think that's not actually a service to our users. They'll have to adapt their scripts and knowledge when we get
aroundto the more concurrent version. What exactly CONCURRENTLY means is already not strictly defined and differs
betweenthe actions.
 
> 
> I'll note that DROP INDEX CONCURRENTLY actually already  internally acquires an AEL lock. Although it's a bit harder
tosee the consequences of that.
 

If the short-lived lock is the only blocker for this feature at the
moment could we just require an additional qualifier for CONCURRENTLY
(FORCE?) until the lock can be removed, something like:

tmp# REINDEX INDEX CONCURRENTLY tmp_pkey;
ERROR:  REINDEX INDEX CONCURRENTLY is not fully concurrent; use REINDEX
INDEX CONCURRENTLY FORCE to perform reindex with a short-lived lock.

tmp=# REINDEX INDEX CONCURRENTLY FORCE tmp_pkey;
REINDEX

It's not optimal, but currently there's no way to reindex a primary key
anywhere close to concurrently and a short lock would be a huge
improvement over the current situation.

/ Oskari

Re: REINDEX CONCURRENTLY 2.0

From

Michael Paquier

Date:

30 December 2014, 08:13:10

On Tue, Dec 23, 2014 at 5:54 PM, Oskari Saarenmaa <os@ohmu.fi> wrote:
>
> If the short-lived lock is the only blocker for this feature at the
> moment could we just require an additional qualifier for CONCURRENTLY
> (FORCE?) until the lock can be removed, something like:
> =# [blah]

FWIW, I'd just keep only CONCURRENTLY with no fancy additional
keywords even if we cheat on it, as long as it is precised in the
documentation that an exclusive lock is taken for a very short time,
largely shorter than what a normal REINDEX would do btw.

> It's not optimal, but currently there's no way to reindex a primary key
> anywhere close to concurrently and a short lock would be a huge
> improvement over the current situation.
Yep.
-- 
Michael

Re: REINDEX CONCURRENTLY 2.0

From

Andres Freund

Date:

02 February 2015, 14:10:27

On 2014-11-12 16:11:58 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> >> If REINDEX cannot work without an exclusive lock, we should invent some
> >> other qualifier, like WITH FEWER LOCKS.
> >
> > What he said.
> 
> But more to the point .... why, precisely, can't this work without an
> AccessExclusiveLock?  And can't we fix that instead of setting for
> something clearly inferior?

So, here's an alternative approach of how to get rid of the AEL
locks. They're required because we want to switch the relfilenodes
around. I've pretty much no confidence in any of the schemes anybody has
come up to avoid that.

So, let's not switch relfilenodes around.

I think if we should instead just use the new index, repoint the
dependencies onto the new oid, and then afterwards, when dropping,
rename the new index one onto the old one. That means the oid of the
index will change and some less than pretty grovelling around
dependencies, but it still seems preferrable to what we're discussing
here otherwise.

Does anybody see a fundamental problem with that approach?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: REINDEX CONCURRENTLY 2.0

From

Robert Haas

Date:

03 February 2015, 15:39:10

On Mon, Feb 2, 2015 at 9:10 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-11-12 16:11:58 -0500, Robert Haas wrote:
>> On Wed, Nov 12, 2014 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Thu, Nov 6, 2014 at 9:50 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> >> If REINDEX cannot work without an exclusive lock, we should invent some
>> >> other qualifier, like WITH FEWER LOCKS.
>> >
>> > What he said.
>>
>> But more to the point .... why, precisely, can't this work without an
>> AccessExclusiveLock?  And can't we fix that instead of setting for
>> something clearly inferior?
>
> So, here's an alternative approach of how to get rid of the AEL
> locks. They're required because we want to switch the relfilenodes
> around. I've pretty much no confidence in any of the schemes anybody has
> come up to avoid that.
>
> So, let's not switch relfilenodes around.
>
> I think if we should instead just use the new index, repoint the
> dependencies onto the new oid, and then afterwards, when dropping,
> rename the new index one onto the old one. That means the oid of the
> index will change and some less than pretty grovelling around
> dependencies, but it still seems preferrable to what we're discussing
> here otherwise.
>
> Does anybody see a fundamental problem with that approach?

I'm not sure whether that will work out, but it seems worth a try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company