Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date
Msg-id CAEze2WjfZPWBW_LeX-gVpRwx=z1T3OZssS4xkNPdBieSDVM1ew@mail.gmail.com
Whole thread Raw
In response to Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
List pgsql-hackers
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Jan-18, Matthias van de Meent wrote:
> >
> > > Example:
> > >
> > > 1.) RI starts
> > > 2.) PHASE 2: filling the index:
> > > 2.1.) scanning the heap (live tuple is cached)
> > > < tuple is deleted
> > > < last transaction other than RI commits, only snapshot of RI exists
> > > < vacuum drops the tuple, and cannot remove it from the new index
> > > because this new index is not yet populated.
> > > 2.2.) sorting tuples
> > > 2.3.) index filled with tuples, incl. deleted tuple
> > > 3.) PHASE 3: wait for transactions
> > > 4.) PHASE 4: validate does not remove the tuple from the index,
> > > because it is not built to do so: it will only insert new tuples.
> > > Tuples that are marked for deletion are removed from the index only
> > > through VACUUM (and optimistic ALL_DEAD detection).
> > >
> > > According to my limited knowledge of RI, it requires VACUUM to not run
> > > on the table during the initial index build process (which is
> > > currently guaranteed through the use of a snapshot).
> >
> > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire
> > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot
> > occur.
>
> Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock.
> Are there no other ways that pages are optimistically pruned?
>
> But the base case still stands, ignoring CIC snapshots in  would give
> the semantic of all_dead to tuples that are actually still considered
> alive in some context, and should not yet be deleted (you're deleting
> data from an in-use snapshot). Any local pruning optimizations using
> all_dead mechanics now cannot be run on the table unless they hold an
> ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms
> (other than below).

Re-thinking this, and after some research:

Is the behaviour of "any process that invalidates TIDs in this table
(that could be in an index on this table) always holds a lock that
conflicts with CIC/RiC on that table" a requirement of tableams, or is
it an implementation-detail?

If it is a requirement, then this patch is a +1 for me (and that
requirement should be documented in such case), otherwise a -1 while
there is no mechanism in place to remove concurrently-invalidated TIDs
from CIC-ed/RiC-ed indexes.

This concurrently-invalidated check could be done through e.g.
updating validate_index to have one more phase that removes unknown /
incorrect TIDs from the index. As a note: index insertion logic would
then also have to be able to handle duplicate TIDs in the index.

Regards,

Matthias van de Meent


Regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting