Re: Support for REINDEX CONCURRENTLY - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Support for REINDEX CONCURRENTLY
Date
Msg-id CAB7nPqStv+6CrRXWAn70i7q6pYSTwFRXPRUZ8gEpv0gXGdDScQ@mail.gmail.com
Whole thread Raw
In response to Re: Support for REINDEX CONCURRENTLY  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Support for REINDEX CONCURRENTLY
List pgsql-hackers
Hi,

Sorry for late reply, I am coming back poking at this patch a bit. One
of the things that I am still unhappy with this patch are the
potential deadlocks that can come up when for example another backend
kicks another operation taking ShareUpdateExclusiveLock (ANALYZE or
another REINDEX CONCURRENTLY) on the same relation as the one
reindexed concurrently. This can happen because we need to wait at
index validation phase as process might not have taken into account
deleted tuples before the reference snapshot was taken. I played a
little bit with a version of the code using no old snapshot waiting,
but even if I couldn't break it directly, concurrent backends
sometimes took incorrect tuples from heap. I unfortunately have no
clear solution about how to solve that... Except making REINDEX
CONCURRENTLY fail when validating the concurrent index with a clear
error message not referencing to any deadlock, giving priority to
other processes like for example ANALYZE, or other backends ready to
kick another REINDEX CONCURRENTLY... Any ideas here are welcome, the
patch attached does the implementation mentioned here.

On Tue, Sep 17, 2013 at 12:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Looking at this version of the patch now:
> 1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
> sentence.
Oops, thanks.

> 2) I don't think the drop algorithm used now is correct. Your
> index_concurrent_set_dead() sets both indisvalid = false and indislive =
> false at the same time. It does so after doing a WaitForVirtualLocks() -
> but that's not sufficient. Between waiting and setting indisvalid =
> false another transaction could start which then would start using that
> index. Which will not get updated anymore by other concurrent backends
> because of inislive = false.
> You really need to follow index_drop's lead here and first unset
> indisvalid then wait till nobody can use the index for querying anymore
> and only then unset indislive.
Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.

> 3) I am not sure if the swap algorithm used now actually is correct
> either. We have mvcc snapshots now, right, but we're still potentially
> taking separate snapshot for individual relcache lookups. What's
> stopping us from temporarily ending up with two relcache entries with
> the same relfilenode?
> Previously you swapped names - I think that might end up being easier,
> because having names temporarily confused isn't as bad as two indexes
> manipulating the same file.
Actually, performing swap operation with names proves to be more
difficult than it looks as it makes necessary a moment where both the
old and new indexes are marked as valid for all the backends. The only
reason for that is that index_set_state_flag assumes that a given xact
has not yet done any transactional update when it is called, forcing
to one the number of state flag that can be changed inside a
transaction. This is a safe method IMO, and we shouldn't break that.
Also, as far as I understood, this is something that we *want* to
avoid to a REINDEX CONCURRENTLY process that might fail and come up
with a double number of valid indexes for a given relation if it is
performed for a table (or an index if reindex is done on an index).
This is also a requirement for toast indexes where new code assumes
that a toast relation can only have one single valid index at the same
time. For those reasons the relfilenode approach is better.

Regards,
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: "Erik Rijkers"
Date:
Subject: Re: Minmax indexes
Next
From: Noah Misch
Date:
Subject: Re: pgbench progress report improvements - split 2