Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
Date | |
Msg-id | CAEze2WjfZPWBW_LeX-gVpRwx=z1T3OZssS4xkNPdBieSDVM1ew@mail.gmail.com Whole thread Raw |
In response to | Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: {CREATE INDEX, REINDEX} CONCURRENTLY improvements
|
List | pgsql-hackers |
On Tue, 19 Jan 2021 at 21:59, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Mon, 18 Jan 2021, 21:25 Álvaro Herrera, <alvherre@alvh.no-ip.org> wrote: > > > > On 2021-Jan-18, Matthias van de Meent wrote: > > > > > Example: > > > > > > 1.) RI starts > > > 2.) PHASE 2: filling the index: > > > 2.1.) scanning the heap (live tuple is cached) > > > < tuple is deleted > > > < last transaction other than RI commits, only snapshot of RI exists > > > < vacuum drops the tuple, and cannot remove it from the new index > > > because this new index is not yet populated. > > > 2.2.) sorting tuples > > > 2.3.) index filled with tuples, incl. deleted tuple > > > 3.) PHASE 3: wait for transactions > > > 4.) PHASE 4: validate does not remove the tuple from the index, > > > because it is not built to do so: it will only insert new tuples. > > > Tuples that are marked for deletion are removed from the index only > > > through VACUUM (and optimistic ALL_DEAD detection). > > > > > > According to my limited knowledge of RI, it requires VACUUM to not run > > > on the table during the initial index build process (which is > > > currently guaranteed through the use of a snapshot). > > > > VACUUM cannot run concurrently with CIC or RI in a table -- both acquire > > ShareUpdateExclusiveLock, which conflicts with itself, so this cannot > > occur. > > Yes, you are correct. Vacuum indeed has a ShareUpdateExclusiveLock. > Are there no other ways that pages are optimistically pruned? > > But the base case still stands, ignoring CIC snapshots in would give > the semantic of all_dead to tuples that are actually still considered > alive in some context, and should not yet be deleted (you're deleting > data from an in-use snapshot). Any local pruning optimizations using > all_dead mechanics now cannot be run on the table unless they hold an > ShareUpdateExclusiveLock; though I'm unaware of any such mechanisms > (other than below). Re-thinking this, and after some research: Is the behaviour of "any process that invalidates TIDs in this table (that could be in an index on this table) always holds a lock that conflicts with CIC/RiC on that table" a requirement of tableams, or is it an implementation-detail? If it is a requirement, then this patch is a +1 for me (and that requirement should be documented in such case), otherwise a -1 while there is no mechanism in place to remove concurrently-invalidated TIDs from CIC-ed/RiC-ed indexes. This concurrently-invalidated check could be done through e.g. updating validate_index to have one more phase that removes unknown / incorrect TIDs from the index. As a note: index insertion logic would then also have to be able to handle duplicate TIDs in the index. Regards, Matthias van de Meent Regards, Matthias van de Meent
pgsql-hackers by date: