Re: Support for REINDEX CONCURRENTLY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Support for REINDEX CONCURRENTLY
Date
Msg-id 20130306085039.GI13803@alap2.anarazel.de
Whole thread Raw
In response to Re: Support for REINDEX CONCURRENTLY  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Support for REINDEX CONCURRENTLY  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
> Please find attached updated patch realigned with your comments. You can
> find my answers inline...
> The only thing that needs clarification is the comment about
> UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
> corrected or adapted to what you wanted. I am also including now tests for
> matviews.
> 
> On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund <andres@2ndquadrant.com>wrote:
> 
> > On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
> >
> > > > +             for (count = 0; count < num_indexes; count++)
> > > > > +                     index_insert(toastidxs[count], t_values,
> > t_isnull,
> > > > > +                                              &(toasttup->t_self),
> > > > > +                                              toastrel,
> > > > > +
> > > >  toastidxs[count]->rd_index->indisunique ?
> > > > > +                                              UNIQUE_CHECK_YES :
> > > > UNIQUE_CHECK_NO);
> > > >
> > > > The indisunique check looks like a copy & pasto to me, albeit not
> > > > yours...
> > > >
> > > Yes it is the same for all the indexes normally, but it looks more solid
> > to
> > > me to do that as it is. So unchanged.
> >
> > Hm, if the toast indexes aren't unique anymore loads of stuff would be
> > broken. Anyway, not your "fault".
> >
> I definitely cannot understand where you are going here. Could you be more
> explicit? Why could this be a problem? Without my patch a similar check is
> used for toast indexes.

There's no problem. I just dislike the pointless check which caters for
a situation that doesn't exist...
Forget it, sorry.

> > > > > +                             if (count == 0)
> > > > > +                                     snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index",
> > > > > +                                                      OIDOldHeap);
> > > > > +                             else
> > > > > +                                     snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > > > +                                                      OIDOldHeap,
> > > > count);
> > > > > +                             RenameRelationInternal(lfirst_oid(lc),
> > > > > +
> > > >  NewToastName);
> > > > > +                             count++;
> > > > > +                     }
> > > >
> > > > Hm. It seems wrong that this layer needs to know about _cct.
> > > >
> > >  Any other idea? For the time being I removed cct and added only a suffix
> > > based on the index number...
> >
> > Hm. It seems like throwing an error would be sufficient, that path is
> > only entered for shared catalogs, right? Having multiple toast indexes
> > would be a bug.
> >
> Don't think so. Even if now those APIs are used only for catalog tables, I
> do not believe that this function has been designed to be used only with
> shared catalogs. Removing the cct suffix makes sense though...

Forget what I said.

> > > > > +     /*
> > > > > +      * Index is considered as a constraint if it is PRIMARY KEY or
> > > > EXCLUSION.
> > > > > +      */
> > > > > +     isconstraint =  indexRelation->rd_index->indisprimary ||
> > > > > +             indexRelation->rd_index->indisexclusion;
> > > >
> > > > unique constraints aren't mattering here?
> > > >
> > > No they are not. Unique indexes are not counted as constraints in the
> > case
> > > of index_create. Previous versions of the patch did that but there are
> > > issues with unique indexes using expressions.
> >
> > Hm. index_create's comment says:
> >  * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
> > constraint
> >
> > There are unique indexes that are constraints and some that are
> > not. Looking at ->indisunique is not sufficient to determine whether its
> > one or not.
> >
> Hum... OK. I changed that using a method based on get_index_constraint for
> a given index. So if the constraint Oid is invalid, it means that this
> index has no constraints and its concurrent entry won't create an index in
> consequence. It is more stable this way.

Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);

> > > > > +     /*
> > > > > +      * Phase 3 of REINDEX CONCURRENTLY
> > > > > +      *
> > > > > +      * During this phase the concurrent indexes catch up with the
> > > > INSERT that
> > > > > +      * might have occurred in the parent table and are marked as
> > valid
> > > > once done.
> > > > > +      *
> > > > > +      * We once again wait until no transaction can have the table
> > open
> > > > with
> > > > > +      * the index marked as read-only for updates. Each index
> > > > validation is done
> > > > > +      * with a separate transaction to avoid opening transaction
> > for an
> > > > > +      * unnecessary too long time.
> > > > > +      */
> > > >
> > > > Maybe I am being dumb because I have the feeling I said differently in
> > > > the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> > > > The comment seems to say we need to do so.
> > > >
> > > Yes you said the contrary in a previous review. The purpose of this
> > > function is to first gather the locks and then wait for everything at
> > once
> > > to reduce possible conflicts.
> >
> > you say:
> >
> > +        * We once again wait until no transaction can have the table open
> > with
> > +        * the index marked as read-only for updates. Each index
> > validation is done
> > +        * with a separate transaction to avoid opening transaction for an
> > +        * unnecessary too long time.
> >
> > Which doesn't seem to be done?
> >
> > I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
> > not being necessary in this phase. Which I think is correct.
> >
> Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
> needed. If we don't use it the pg_index entries will be updated but not the
> cache, what is incorrect.

A heap_update will cause cache invalidations to be sent.

> Anyway, if I claimed otherwise, I think I was wrong:
> >
> > The reason - I think - we need to wait here is that otherwise its not
> > guaranteed that all other backends see the index with ->isready
> > set. Which means they might add tuples which are invisible to the mvcc
> > snapshot passed to validate_index() (just created beforehand) which are
> > not yet added to the new index because those backends think the index is
> > not ready yet.
> > Any flaws in that logic?
> >
> Not that I think. In consequence, and I think we will agree on that: I am
> removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
> parent relation for EACH index before building and validating it.

I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.

20130305_2_reindex_concurrently_v17.patch:
+        /* we can do away with our snapshot */
+        PopActiveSnapshot();
+
+        /*
+         * Commit this transaction to make the indisready update visible for
+         * concurrent index.
+         */
+        CommitTransactionCommand();
+    }
+
+
+    /*
+     * Phase 3 of REINDEX CONCURRENTLY
+     *
+     * During this phase the concurrent indexes catch up with the INSERT that
+     * might have occurred in the parent table and are marked as valid once done.
+     *
+     * We once again wait until no transaction can have the table open with
+     * the index marked as read-only for updates. Each index validation is done
+     * with a separate transaction to avoid opening transaction for an
+     * unnecessary too long time.
+     */
+
+    /*
+     * Perform a scan of each concurrent index with the heap, then insert
+     * any missing index entries.
+     */
+    foreach(lc, concurrentIndexIds)
+    {
+        Oid indOid = lfirst_oid(lc);
+        Oid relOid;


Thanks!

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums
Next
From: Alexander Korotkov
Date:
Subject: Re: WIP: index support for regexp search