Re: Support for REINDEX CONCURRENTLY - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Support for REINDEX CONCURRENTLY
Date
Msg-id CA+TgmoaY23ouHSo3TwVHJZuAmKjk-he05Rp4_BMk29Mf6xhFmg@mail.gmail.com
Whole thread Raw
In response to Re: Support for REINDEX CONCURRENTLY  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Support for REINDEX CONCURRENTLY  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Wed, Aug 28, 2013 at 9:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> During swap phase, process was waiting for transactions with older
>> snapshots than the one taken by transaction doing the swap as they
>> might hold the old index information. I think that we can get rid of
>> it thanks to the MVCC snapshots as other backends are now able to see
>> what is the correct index information to fetch.
>
> I don't see MVCC snapshots guaranteeing that. The only thing changed due
> to them is that other backends see a self consistent picture of the
> catalog (i.e. not either, neither or both versions of a tuple as
> earlier). It's still can be out of date. And we rely on those not being
> out of date.
>
> I need to look into the patch for more details.

I agree with Andres.  The only way in which the MVCC catalog snapshot
patch helps is that you can now do a transactional update on a system
catalog table without fearing that other backends will see the row as
nonexistent or duplicated.  They will see exactly one version of the
row, just as you would naturally expect.  However, a backend's
syscaches can still contain old versions of rows, and they can still
cache older versions of some tuples and newer versions of other
tuples.  Those caches only get reloaded when shared-invalidation
messages are processed, and that only happens when the backend
acquires a lock on a new relation.

I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need.  Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message.  However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed.  Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed.  This would
probably lead to much shorter waits.  Or maybe we should have
individual backends process invalidations more frequently, and try to
set things up so that once an invalidation is sent, the sending
backend is immediately guaranteed that it will be processed soon
enough, and thus it doesn't need to wait at all.  This is all pie in
the sky, though.  I don't have a clear idea how to design something
that's an improvement over the (rather intricate) system we have
today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: [v9.4] row level security
Next
From: Jim Nasby
Date:
Subject: Re: pg_system_identifier()