Thread: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

{CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Alvaro Herrera
Date:
Hello,

In a previous thread [1], we added smarts so that processes running
CREATE INDEX CONCURRENTLY would not wait for each other.

One is adding the same to REINDEX CONCURRENTLY.  I've attached patch
0002 here which does that.

Why 0002, you ask?  That's because preparatory patch 0001 simplifies the
ReindexRelationConcurrently somewhat by adding a struct to be used of
indexes that are going to be processed, instead of just a list of Oids.
This is a good change in itself because it let us get rid of duplicative
open/close of the index rels in order to obtain some info that's already
known at the start.

The other thing is that it'd be good if we can make VACUUM also ignore
Xmin of processes doing CREATE INDEX CONCURRENTLY and REINDEX
CONCURRENTLY, when possible.  I have two possible ideas to handle this,
about which I'll post later.


[1] https://postgr.es/m/20200810233815.GA18970@alvherre.pgsql

-- 
Álvaro Herrera       Valdivia, Chile

Attachment

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Dmitry Dolgov
Date:
> On Mon, Nov 30, 2020 at 04:54:39PM -0300, Alvaro Herrera wrote:
>
> In a previous thread [1], we added smarts so that processes running
> CREATE INDEX CONCURRENTLY would not wait for each other.
>
> One is adding the same to REINDEX CONCURRENTLY.  I've attached patch
> 0002 here which does that.
>
> Why 0002, you ask?  That's because preparatory patch 0001 simplifies the
> ReindexRelationConcurrently somewhat by adding a struct to be used of
> indexes that are going to be processed, instead of just a list of Oids.
> This is a good change in itself because it let us get rid of duplicative
> open/close of the index rels in order to obtain some info that's already
> known at the start.

Thanks! The patch looks pretty good to me, after reading it I only have
a few minor comments/questions:

* ReindexIndexInfo sounds a bit weird for me because of the repeating
  part, although I see there is already a similar ReindexIndexCallbackState
  so probably it's fine.

* This one is mostly for me to understand. There are couple of places
  with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
  transaction only takes a snapshot to do some catalog manipulation'.
  But for some of them I don't immediately see in the relevant code
  anything related to snapshots. E.g. one in DefineIndex is followed by
  WaitForOlderSnapshots (which seems to only do waiting, not taking a
  snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
  Is taking a snapshot hidden somewhere there inside?



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Alvaro Herrera
Date:
On 2020-Dec-04, Dmitry Dolgov wrote:

> * This one is mostly for me to understand. There are couple of places
>   with a commentary that 'PROC_IN_SAFE_IC is not necessary, because the
>   transaction only takes a snapshot to do some catalog manipulation'.
>   But for some of them I don't immediately see in the relevant code
>   anything related to snapshots. E.g. one in DefineIndex is followed by
>   WaitForOlderSnapshots (which seems to only do waiting, not taking a
>   snapshot), index_set_state_flags and CacheInvalidateRelcacheByRelid.
>   Is taking a snapshot hidden somewhere there inside?

Well, they're actually going to acquire an Xid, not a snapshot, so the
comment is slightly incorrect; I'll fix it, thanks for pointing that
out.  The point stands: because those transactions are of very short
duration (at least of very short duration after the point where the XID
is obtained), it's not necessary to set the PROC_IN_SAFE_IC flag, since
it won't cause any disruption to other processes.

It is possible that I copied the wrong comment in DefineIndex.  (I only
noticed that one after I had mulled over the ones in
ReindexRelationConcurrently, each of which is skipping setting the flag
for slightly different reasons.)



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Hamid Akhtar
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by
Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the
commentsper your last email. 

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Masahiko Sawada
Date:
On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            not tested
>
> The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by
Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the
commentsper your last email. 

For the 0001 patch, since ReindexIndexInfo is used only within
ReindexRelationConcurrently() I think it’s a function-local structure
type. So we can declare it within the function. What do you think?

Regards,


--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
On 2021-Jan-04, Masahiko Sawada wrote:

> On Tue, Dec 8, 2020 at 5:18 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:
> >
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:       not tested
> > Spec compliant:           not tested
> > Documentation:            not tested
> >
> > The patch looks good to me. With regards to the code comments, I had pretty similar concerns as already raised by
Dmitry;and those have already been answered by you. So this patch is good to go from my side once you have updated the
commentsper your last email.
 
> 
> For the 0001 patch, since ReindexIndexInfo is used only within
> ReindexRelationConcurrently() I think it’s a function-local structure
> type. So we can declare it within the function. What do you think?

That's a good idea.  Pushed with that change, thanks.

Thanks also to Dmitry and Hamid for the reviews.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
On 2021-Jan-12, Álvaro Herrera wrote:

> > For the 0001 patch, since ReindexIndexInfo is used only within
> > ReindexRelationConcurrently() I think it’s a function-local structure
> > type. So we can declare it within the function. What do you think?
> 
> That's a good idea.  Pushed with that change, thanks.

Here's the other patch, with comments fixed per reviews, and rebased to
account for the scope change.

-- 
Álvaro Herrera       Valdivia, Chile
Tulio: oh, para qué servirá este boton, Juan Carlos?
Policarpo: No, aléjense, no toquen la consola!
Juan Carlos: Lo apretaré una y otra vez.

Attachment

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
On 2021-Jan-12, Álvaro Herrera wrote:

> On 2021-Jan-12, Álvaro Herrera wrote:
> 
> > > For the 0001 patch, since ReindexIndexInfo is used only within
> > > ReindexRelationConcurrently() I think it’s a function-local structure
> > > type. So we can declare it within the function. What do you think?
> > 
> > That's a good idea.  Pushed with that change, thanks.
> 
> Here's the other patch, with comments fixed per reviews, and rebased to
> account for the scope change.

Pushed.  At the last minute I decided to back off on reverting the flag
set that DefineIndex has, because there was no point in doing that.  I
also updated the comment in (two places of) ReindexRelationConcurrently
about why we don't do it there.  The previous submission had this:

> +    /*
> +     * This transaction doesn't need to set the PROC_IN_SAFE_IC flag, because
> +     * it only acquires an Xid to do some catalog manipulations, after the
> +     * wait is over.
> +     */

but this fails to point out that the main reason not to do it, is to
avoid having to decide whether to do it or not when some indexes are
safe and others aren't.  So I changed to:

+   /*
+    * While we could set PROC_IN_SAFE_IC if all indexes qualified, there's no
+    * real need for that, because we only acquire an Xid after the wait is
+    * done, and that lasts for a very short period.
+    */

Thanks!

-- 
Álvaro Herrera       Valdivia, Chile
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
So one last remaining improvement was to have VACUUM ignore processes
doing CIC and RC to compute the Xid horizon of tuples to remove.  I
think we can do something simple like the attached patch.

-- 
Álvaro Herrera       Valdivia, Chile
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Attachment

Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Matthias van de Meent
Date:
On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> So one last remaining improvement was to have VACUUM ignore processes
> doing CIC and RC to compute the Xid horizon of tuples to remove.  I
> think we can do something simple like the attached patch.

Regarding the patch:

> +            if (statusFlags & PROC_IN_SAFE_IC)
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            else
> +                h->data_oldest_nonremovable = h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->data_oldest_nonremovable, xmin);

Would this not need to be the following? Right now, it resets
potentially older h->catalog_oldest_nonremovable (which is set in the
PROC_IN_SAFE_IC branch).

> +            if (statusFlags & PROC_IN_SAFE_IC)
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            else
> +            {
> +                h->data_oldest_nonremovable =
> +                    TransactionIdOlder(h->data_oldest_nonremovable, xmin);
> +                h->catalog_oldest_nonremovable =
> +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> +            }


Prior to reading the rest of my response: I do not understand the
intricacies of the VACUUM process, nor the systems of db snapshots, so
it's reasonably possible I misunderstand this.
But would this patch not allow for tuples to be created, deleted and
vacuumed away from a table whilst rebuilding an index on that table,
resulting in invalid pointers in the index?

Example:

1.) RI starts
2.) PHASE 2: filling the index:
2.1.) scanning the heap (live tuple is cached)
< tuple is deleted
< last transaction other than RI commits, only snapshot of RI exists
< vacuum drops the tuple, and cannot remove it from the new index
because this new index is not yet populated.
2.2.) sorting tuples
2.3.) index filled with tuples, incl. deleted tuple
3.) PHASE 3: wait for transactions
4.) PHASE 4: validate does not remove the tuple from the index,
because it is not built to do so: it will only insert new tuples.
Tuples that are marked for deletion are removed from the index only
through VACUUM (and optimistic ALL_DEAD detection).

According to my limited knowledge of RI, it requires VACUUM to not run
on the table during the initial index build process (which is
currently guaranteed through the use of a snapshot).


Regards,

Matthias van de Meent



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
On 2021-Jan-18, Matthias van de Meent wrote:

> Example:
> 
> 1.) RI starts
> 2.) PHASE 2: filling the index:
> 2.1.) scanning the heap (live tuple is cached)
> < tuple is deleted
> < last transaction other than RI commits, only snapshot of RI exists
> < vacuum drops the tuple, and cannot remove it from the new index
> because this new index is not yet populated.
> 2.2.) sorting tuples
> 2.3.) index filled with tuples, incl. deleted tuple
> 3.) PHASE 3: wait for transactions
> 4.) PHASE 4: validate does not remove the tuple from the index,
> because it is not built to do so: it will only insert new tuples.
> Tuples that are marked for deletion are removed from the index only
> through VACUUM (and optimistic ALL_DEAD detection).
> 
> According to my limited knowledge of RI, it requires VACUUM to not run
> on the table during the initial index build process (which is
> currently guaranteed through the use of a snapshot).

VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
occur.

I do wonder if the problem you suggest (or something similar) can occur
via HOT pruning, though.

-- 
Álvaro Herrera       Valdivia, Chile



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Álvaro Herrera
Date:
On 2021-Jan-18, Matthias van de Meent wrote:

> On Fri, 15 Jan 2021 at 15:29, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> Would this not need to be the following? Right now, it resets
> potentially older h->catalog_oldest_nonremovable (which is set in the
> PROC_IN_SAFE_IC branch).
> 
> > +            if (statusFlags & PROC_IN_SAFE_IC)
> > +                h->catalog_oldest_nonremovable =
> > +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> > +            else
> > +            {
> > +                h->data_oldest_nonremovable =
> > +                    TransactionIdOlder(h->data_oldest_nonremovable, xmin);
> > +                h->catalog_oldest_nonremovable =
> > +                    TransactionIdOlder(h->catalog_oldest_nonremovable, xmin);
> > +            }

Oops, you're right.

-- 
Álvaro Herrera       Valdivia, Chile



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Matthias van de Meent
Date:
On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Jan-18, Matthias van de Meent wrote:
>
> > Example:
> >
> > 1.) RI starts
> > 2.) PHASE 2: filling the index:
> > 2.1.) scanning the heap (live tuple is cached)
> > < tuple is deleted
> > < last transaction other than RI commits, only snapshot of RI exists
> > < vacuum drops the tuple, and cannot remove it from the new index
> > because this new index is not yet populated.
> > 2.2.) sorting tuples
> > 2.3.) index filled with tuples, incl. deleted tuple
> > 3.) PHASE 3: wait for transactions
> > 4.) PHASE 4: validate does not remove the tuple from the index,
> > because it is not built to do so: it will only insert new tuples.
> > Tuples that are marked for deletion are removed from the index only
> > through VACUUM (and optimistic ALL_DEAD detection).
> >
> > According to my limited knowledge of RI, it requires VACUUM to not run
> > on the table during the initial index build process (which is
> > currently guaranteed through the use of a snapshot).
>
> VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> occur.

Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
Are there no other ways that pages are optimistically pruned?

But the base case still stands, ignoring CIC snapshots in  would give
the semantic of all_dead to tuples that are actually still considered
alive in some context, and should not yet be deleted (you're deleting
data from an in-use snapshot). Any local pruning optimizations using
all_dead mechanics now cannot be run on the table unless they hold an
ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
(other than below).

> I do wonder if the problem you suggest (or something similar) can occur
> via HOT pruning, though.

It could not, at least not at the current HEAD, as only one tuple in a
HOT-chain can be alive at one point, and all indexes point to the root
of the HOT-chain, which is never HOT-pruned. See also the
src/backend/access/heap/README.HOT.

Regards,

Matthias van de Meent



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Matthias van de Meent
Date:
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Jan-18, Matthias van de Meent wrote:
> >
> > > Example:
> > >
> > > 1.) RI starts
> > > 2.) PHASE 2: filling the index:
> > > 2.1.) scanning the heap (live tuple is cached)
> > > < tuple is deleted
> > > < last transaction other than RI commits, only snapshot of RI exists
> > > < vacuum drops the tuple, and cannot remove it from the new index
> > > because this new index is not yet populated.
> > > 2.2.) sorting tuples
> > > 2.3.) index filled with tuples, incl. deleted tuple
> > > 3.) PHASE 3: wait for transactions
> > > 4.) PHASE 4: validate does not remove the tuple from the index,
> > > because it is not built to do so: it will only insert new tuples.
> > > Tuples that are marked for deletion are removed from the index only
> > > through VACUUM (and optimistic ALL_DEAD detection).
> > >
> > > According to my limited knowledge of RI, it requires VACUUM to not run
> > > on the table during the initial index build process (which is
> > > currently guaranteed through the use of a snapshot).
> >
> > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> > occur.
>
> Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
> Are there no other ways that pages are optimistically pruned?
>
> But the base case still stands, ignoring CIC snapshots in  would give
> the semantic of all_dead to tuples that are actually still considered
> alive in some context, and should not yet be deleted (you're deleting
> data from an in-use snapshot). Any local pruning optimizations using
> all_dead mechanics now cannot be run on the table unless they hold an
> ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
> (other than below).

Re-thinking this, and after some research:

Is the behaviour of "any process that invalidates TIDs in this table
(that could be in an index on this table) always holds a lock that
conflicts with CIC/RiC on that table" a requirement of tableams, or is
it an implementation-detail?

If it is a requirement, then this patch is a +1 for me (and that
requirement should be documented in such case), otherwise a -1 while
there is no mechanism in place to remove concurrently-invalidated TIDs
from CIC-ed/RiC-ed indexes.

This concurrently-invalidated check could be done through e.g.
updating validate_index to have one more phase that removes unknown /
incorrect TIDs from the index. As a note: index insertion logic would
then also have to be able to handle duplicate TIDs in the index.

Regards,

Matthias van de Meent


Regards,

Matthias van de Meent



Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From
Robert Haas
Date:
On Thu, Jan 21, 2021 at 3:08 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Re-thinking this, and after some research:
>
> Is the behaviour of "any process that invalidates TIDs in this table
> (that could be in an index on this table) always holds a lock that
> conflicts with CIC/RiC on that table" a requirement of tableams, or is
> it an implementation-detail?
>
> If it is a requirement, then this patch is a +1 for me (and that
> requirement should be documented in such case), otherwise a -1 while
> there is no mechanism in place to remove concurrently-invalidated TIDs
> from CIC-ed/RiC-ed indexes.

I don't believe that the table AM machinery presumes that every kind
of table AM necessarily has to support VACUUM; or at least, I don't
think the original version presumed that. However, if you want it to
work some other way, there's not really any infrastructure for that,
either. Like, if you want to run your own background workers to clean
up your table, you could, but you'd have to arrange that yourself. If
you want a SWEEP command or an OPTIMIZE TABLE command instead of
VACUUM, you'd have to hack the grammar. Likewise, I don't know that
there's any guarantee that CLUSTER would work on a table of any old AM
either, or that it would do anything useful.

But having said that, if you have a table AM that *does* support
VACUUM and CLUSTER, I think it would have to treat TIDs pretty much
just as we do today. Almost by definition, they have to live with the
way the rest of the system works. TIDs have to be 6 bytes, and lock
levels can't be changed, because there's nothing in table AM that
would let you tinker with that stuff. The table AM can opt out of that
machinery altogether in favor of just throwing an error, but it can't
make it work differently unless somebody extends the API. It's
possible that this might suck a lot for some AMs. For instance, if
your AM is an in-memory btree, you might want CLUSTER to work in
place, rather than by copying the whole relation file, but I doubt
it's possible to make that work using the APIs as they exist. Maybe
someday we'll have better ones, but right now we don't.

As far as the specific question asked here, I don't really understand
how it could ever work any differently. If you want to invalidate some
TIDs in such a way that they can be reused by unrelated tuples - in
the case of the heap this would be by making the line pointers
LP_UNUSED - that has to be coordinated with the indexing machinery. I
can imagine a table AM that never reuses TIDs - especially if we
eventually have TIDs > 6 bytes - but I can't imagine a table AM that
reuses TIDs that might still be present in the index without telling
the index. And that seems to be what is being proposed here: if CIC or
RiC could be putting TIDs into indexes while at the same time some of
those very same TIDs were being made available for reuse, that would
be chaos, and right now we have no mechanism other than the lock level
to prevent that. We could invent one, maybe, but it doesn't exist
today, and zheap never tried to invent anything like that.

I agree that, if we do this, the comment should make clear that the
CIC/RiC lock has to conflict with VACUUM to avoid indexing tuples that
VACUUM is busy marking LP_UNUSED, and that this is precisely because
other backends will ignore the CIC/RiC snapshot while determining
whether a tuple is live. I don't think this is the case.
Hypothetically speaking, suppose we rejected this patch. Then, suppose
someone proposed to replace ShareUpdateExclusiveLock with two new lock
levels VacuumTuplesLock and IndexTheRelationLock each of which
conflicts with itself and all other lock levels with which
ShareUpdateExclusiveLock conflicts, but the two don't conflict with
each other. AFAIK, that patch would be acceptable today and
unacceptable after this change. While I'm not arguing that as a reason
to reject this patch, it's not impossible that somebody could: they'd
just need to argue that the separated lock levels would have greater
value than this patch. I don't think I believe that, but it's not a
totally crazy suggestion. My main point though is that this
meaningfully changing some assumptions about how the system works, and
we should be sure to be clear about that.

I looked at the patch but don't really understand it; the code in this
area seems to have changed a lot since I last looked at it. So, I'm
just commenting mostly on the specific question that was asked, and a
little bit on the theory of the patch, but without expressing an
opinion on the implementation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com