Re: race condition in pg_class - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: race condition in pg_class |
Date | |
Msg-id | 20231027184832.9b.nmisch@google.com Whole thread Raw |
In response to | Re: race condition in pg_class (Noah Misch <noah@leadboat.com>) |
Responses |
Re: race condition in pg_class
|
List | pgsql-hackers |
On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote: > On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote: > > We are running PG13.10 and recently we have encountered what appears to be > > a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and > > some other catalog-writer, possibly ANALYZE. > > The problem is that after successfully creating index on relation (which > > previosly didnt have any indexes), its pg_class.relhasindex remains set to > > "false", which is illegal, I think. It's damaging. The table will behave like it has no indexes. If something adds an index later, old indexes will reappear, corrupt, having not received updates during the relhasindex=false era. ("pg_amcheck --heapallindexed" can detect this.) > > Index was built using the following statement: > > ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id); > > This is going to be a problem with any operation that does a transactional > pg_class update without taking a lock that conflicts with ShareLock. GRANT > doesn't lock the table at all, so I can reproduce this in v17 as follows: > > == session 1 > create table t (c int); > begin; > grant select on t to public; > > == session 2 > alter table t add primary key (c); > > == back in session 1 > commit; > > > We'll likely need to change how we maintain relhasindex or perhaps take a lock > in GRANT. The main choice is accepting more DDL blocking vs. accepting inefficient relcache builds. Options I'm seeing: === "more DDL blocking" option family B1. Take ShareUpdateExclusiveLock in GRANT, REVOKE, and anything that makes transactional pg_class updates without holding some stronger lock. New asserts could catch future commands failing to do this. B2. Take some shorter-lived lock around pg_class tuple formation, such that GRANT blocks CREATE INDEX, but two CREATE INDEX don't block each other. Anything performing a transactional update of a pg_class row would acquire the lock in exclusive mode before fetching the old tuple and hold it till end of transaction. relhasindex=true in-place updates would acquire it the same way, but they would release it after the inplace update. I expect a new heavyweight lock type, e.g. LOCKTAG_RELATION_DEFINITION, with the same key as LOCKTAG_RELATION. This has less blocking than the previous option, but it's more complicated to explain to both users and developers. B3. Have CREATE INDEX do an EvalPlanQual()-style thing to update all successor tuple versions. Like the previous option, this would require new locking, but the new lock would not need to persist till end of xact. It would be even more complicated to explain to users and developers. (If this is promising enough to warrant more detail, let me know.) B4. Use transactional updates to set relhasindex=true. Two CREATE INDEX commands on the same table would block each other. If we did it the way most DDL does today, they'd get "tuple concurrently updated" failures after the blocking ends. === "inefficient relcache builds" option family R1. Ignore relhasindex; possibly remove it in v17. Relcache builds et al. will issue more superfluous queries. R2. As a weird variant of the previous option, keep relhasindex and make all transactional updates of pg_class set relhasindex=true pessimistically. (VACUUM will set it back to false.) === other O1. This is another case where the sometimes-discussed "pg_class_nt" for nontransactional columns would help. I'm ruling that out as too hard to back-patch. Are there other options important to consider? I currently like (B1) the most, followed closely by (R1) and (B2). A key unknown is the prevalence of index-free tables. Low prevalence would argue in favor of (R1). In my limited experience, they've been rare. That said, I assume relcache builds happen a lot more than GRANTs, so it's harder to bound the damage from (R1) compared to the damage from (B1). Thoughts on this decision? Thanks, nm
pgsql-hackers by date: