On Wed, Oct 25, 2023 at 01:39:41PM +0300, Smolkin Grigory wrote:
> We are running PG13.10 and recently we have encountered what appears to be
> a bug due to some race condition between ALTER TABLE ... ADD CONSTRAINT and
> some other catalog-writer, possibly ANALYZE.
> The problem is that after successfully creating index on relation (which
> previosly didnt have any indexes), its pg_class.relhasindex remains set to
> "false", which is illegal, I think.
> Index was built using the following statement:
> ALTER TABLE "example" ADD constraint "example_pkey" PRIMARY KEY (id);
This is going to be a problem with any operation that does a transactional
pg_class update without taking a lock that conflicts with ShareLock. GRANT
doesn't lock the table at all, so I can reproduce this in v17 as follows:
== session 1
create table t (c int);
begin;
grant select on t to public;
== session 2
alter table t add primary key (c);
== back in session 1
commit;
We'll likely need to change how we maintain relhasindex or perhaps take a lock
in GRANT.
> Looking into the WAL via waldump given us the following picture (full
> waldump output is attached):
> 1202295045 - create index statement
> 1202298790 and 1202298791 are some other concurrent operations,
> unfortunately I wasnt able to determine what are they
Can you explore that as follows?
- PITR to just before the COMMIT record.
- Save all rows of pg_class.
- PITR to just after the COMMIT record.
- Save all rows of pg_class.
- Diff the two sets of saved rows.
Which columns changed? The evidence you've shown would be consistent with a
transaction doing GRANT or REVOKE on dozens of tables. If the changed column
is something other than relacl, that would be great to know.
On the off-chance it's relevant, what extensions do you have (\dx in psql)?