Re: race condition in pg_class - Mailing list pgsql-hackers

From Noah Misch
Subject Re: race condition in pg_class
Date
Msg-id 20231027184832.9b.nmisch@google.com
Whole thread Raw
In response to Re: race condition in pg_class  (Noah Misch <noah@leadboat.com>)
Responses Re: race condition in pg_class
List pgsql-hackers
On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:
> On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote:
> > We are running PG13.10 and recently we have encountered what appears to be
> > a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
> > some other catalog-writer, possibly ANALYZE.
> > The problem is that after successfully creating index on relation (which
> > previosly didnt have any indexes), its pg_class.relhasindex remains set to
> > "false", which is illegal, I think.

It's damaging.  The table will behave like it has no indexes.  If something
adds an index later, old indexes will reappear, corrupt, having not received
updates during the relhasindex=false era.  ("pg_amcheck --heapallindexed" can
detect this.)

> > Index was built using the following statement:
> > ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);
> 
> This is going to be a problem with any operation that does a transactional
> pg_class update without taking a lock that conflicts with ShareLock.  GRANT
> doesn't lock the table at all, so I can reproduce this in v17 as follows:
> 
> == session 1
> create table t (c int);
> begin;
> grant select on t to public;
> 
> == session 2
> alter table t add primary key (c);
> 
> == back in session 1
> commit;
> 
> 
> We'll likely need to change how we maintain relhasindex or perhaps take a lock
> in GRANT.

The main choice is accepting more DDL blocking vs. accepting inefficient
relcache builds.  Options I'm seeing:

=== "more DDL blocking" option family

B1. Take ShareUpdateExclusiveLock in GRANT, REVOKE, and anything that makes
    transactional pg_class updates without holding some stronger lock.  New
    asserts could catch future commands failing to do this.

B2. Take some shorter-lived lock around pg_class tuple formation, such that
    GRANT blocks CREATE INDEX, but two CREATE INDEX don't block each other.
    Anything performing a transactional update of a pg_class row would acquire
    the lock in exclusive mode before fetching the old tuple and hold it till
    end of transaction.  relhasindex=true in-place updates would acquire it
    the same way, but they would release it after the inplace update.  I
    expect a new heavyweight lock type, e.g. LOCKTAG_RELATION_DEFINITION, with
    the same key as LOCKTAG_RELATION.  This has less blocking than the
    previous option, but it's more complicated to explain to both users and
    developers.

B3. Have CREATE INDEX do an EvalPlanQual()-style thing to update all successor
    tuple versions.  Like the previous option, this would require new locking,
    but the new lock would not need to persist till end of xact.  It would be
    even more complicated to explain to users and developers.  (If this is
    promising enough to warrant more detail, let me know.)

B4. Use transactional updates to set relhasindex=true.  Two CREATE INDEX
    commands on the same table would block each other.  If we did it the way
    most DDL does today, they'd get "tuple concurrently updated" failures
    after the blocking ends.

=== "inefficient relcache builds" option family

R1. Ignore relhasindex; possibly remove it in v17.  Relcache builds et
    al. will issue more superfluous queries.

R2. As a weird variant of the previous option, keep relhasindex and make all
    transactional updates of pg_class set relhasindex=true pessimistically.
    (VACUUM will set it back to false.)

=== other

O1. This is another case where the sometimes-discussed "pg_class_nt" for
    nontransactional columns would help.  I'm ruling that out as too hard to
    back-patch.


Are there other options important to consider?  I currently like (B1) the
most, followed closely by (R1) and (B2).  A key unknown is the prevalence of
index-free tables.  Low prevalence would argue in favor of (R1).  In my
limited experience, they've been rare.  That said, I assume relcache builds
happen a lot more than GRANTs, so it's harder to bound the damage from (R1)
compared to the damage from (B1).  Thoughts on this decision?

Thanks,
nm



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Annoying corruption in PostgreSQL.
Next
From: "Ryo Matsumura (Fujitsu)"
Date:
Subject: Buf fix: update-po for PGXS does not work